ISSN  0316-6295 


Satisfying  Database  States 

by 

Marc  H.  Graham 
Technical  Report  CSRG-137 

December,  1981 


COMPUTER  SYSTEMS  RESEARCH  GROUP 

UNIVERSITY  OF  TORONTO 


[ARBOgJ 


Satisfying  Database  States 

by 

Marc  H.  Graham 
Technical  Report  CSRG-137 

December,  1981 


The  Computer  Systems  Research  Group  (CSRG)  is  an  interdisciplinary  group 
formed  to  conduct  research  and  development  relevant  to  computer  systems  and 
their  application.  It  is  jointly  administered  by  the  Department  of  Electrical 
Engineering  and  the  Department  of  Computer  Science  of  the  University  of 
Toronto,  and  is  supported  in  part  by  the  Natural  Sciences  and  Engineering  Coun¬ 
cil  of  Canada. 


©  Copyright  1981  by  Marc  H.  Graham  and  the  Computer  Systems  Research 
Group,  University  of  Toronto. 


Digitized  by  the  Internet  Archive 
in  2018  with  funding  from 
University  of  Toronto 


https://archive.org/details/technicalreportc137univ 


•As  Gtidel  declared  over  and  over , 

" Everything  must  be  translated  into 
logical  calculus  before  being  eaten.'" 


Woody  Allen 


'I  once  asked  the  distinguished  anatomist,  Dr.  Z.,  what  he 

did  upon  encountering  mathematical  formulations  he  did  not  quite 

understand,  "I  hum  them,"  he  said.' 


P.  B.  Medawar 


Abstract 


The  theory  of  relational  databases  has  been  carried  out  for  the  most  part 
under  a  highly  controversial  assumption,  known  as  the  "universal  instance"  or 
"pure  universal  relation"  assumption.  Although  this  assumption  is  unrealistic,  its 
utility  to  the  theory  is  in  providing  an  integrated  view  of  the  states  of  a  multi¬ 
relation  database.  A  significantly  weaker  assumption,  the  "universal  scheme" 
assumption,  has  been  proposed  by  Fagin,  Mendelzon  and  Ullrnan,  without  an 
indication  of  its  effects  on  states  of  the  database.  Independently,  Honeyrnan 
proposed  the  notion  of  "weak  instance"  as  a  proper  definition  of  a  database 
state’s  satisfying  a  set  of  functional  dependencies.  We  give  a  formalization  of 
the  universal  scheme  assumption  which  differs  from  that  of  Fagin  et.  al.  and 
show  that  the  weak  instance  definition  is  a  natural  consequence  of  this  formali¬ 
zation.  This  remains  true  even  when  the  database  is  constrained  by  any  set  of 
implicational  dependencies. 

Having  provided  this  logical  foundation,  we  investigate  prior  results  under 
the  new  definition  of  satisfaction.  Wc  show  that  the  results  of  Aho,  Sagiv  and  Ull- 
man  concerning  expression  and  tableaux  equivalence  carry  over.  We  then  turn 
to  the  concept  of  representation  of  dependencies  as  suggested  by  Bernstein.  We 
give  two  different  definitions  of  a  functional  dependency’s  representation  by  a 
schema,  show  they  are  different,  and  show  that  "cover  embedding"  is  a  stronger 
than  necessary  requirement  for  all  of  a  set  of  dependencies  to  be  represented 
by  a  schema.  Finally,  we  investigate  those  schemas  without  interrelational  con¬ 
straints.  We  show  that  the  ideas  first  put  forward  by  Rissanen  are  not  the 


correct  ones  for  this  new  notion  of  satisfaction. 


Acknowledgments. 


I  am  deeply  grateful  for  the  many  long  and  fruitful  discussions  I  have  had 
with  Alberto  Mendelzon.  I  hope  that  our  relationship  will  be  as  productive  in  the 
future  as  it  has  been  in  the  past. 

Many  other  people  have  also  listened  to  and  commented  on  my  problems 
over  the  years.  They  include:  Romas  Aleliunas,  Maria  Klawe,  Yannis  Vassiliou, 
Peter  Honeyman,  Kent  Laver,  Dennis  Tschiritzis.  I  thank  them  all. 

This  research  was  partially  supported  by  the  Natural  Science  and  Engineer¬ 
ing  Research  Council  of  Canada. 


A  not 9  to  th*  rwador. 


Nothing  makes  for  as  uninteresting  reading  as  a  list  of  definitions.  Likewise, 
no  reading  is  as  impenetrable  as  jargon  undefined.  To  balance  these  opposing 
forces.  I  have  assembled  most  of  the  more  basic  definitions  into  the  appendix.  A 

reader  familiar  with  the  area  of  this  thesis  (a  glimpse  through  the  references 

# 

will  help  him  decide  that  issue)  may  choose  to  start  at  the  beginning,  with 
chapter  1.  A  reader  without  such  familiarity  might  be  advised  to  begin  at  the 
end,  surveying  the  lay  of  the  land  in  the  appendix. 

Not  all  terms  are  defined  in  the  appendix.  Terms  invented  for  these  investi¬ 
gations  are  defined  as  they  arise  in  the  text.  The  central  concepts  of  tableaux 
and  the  chase,  although  not  original  to  this  thesis,  are  also  defined,  and 
redefined  when  needed,  within  the  text.  No  Index  of  Definitions  has  been 
prepared;  hopefully,  the  Table  of  Contents  will  serve  as  a  rough  map  to  concepts 
the  reader  may  have  misplaced. 

Some  terms  are  used  without  definition.  I  have  done  this  whenever  I  felt 
they  were  part  of  the  vocabulary  of  any  educated  reader  (eg.  graph)',  when  their 
use  is  much  shorter  than  their  definition  (eg.  ordinal)',  or  when  nothing  formal  is 
said  about  them.  This  last  explains  why  no  definition  of  ‘normal  forms’  may  be 
found  in  this  work,  even  though  the  idea  is  occasionally  mentioned. 


Table  of  Contents 


1.  An  Historical  Introduction 

1.1  Pre-History  1-1 

1.2  History  1-2 

1.3  Outline  of  the  thesis  1-9 


2.  Logical  Foundations 

2.1  The  Universal  Relation  Scheme  Assumption  2-1 

2.2  Dependencies  2-7 

2.3  The  weak  instance,  tableaux,  and  the  chase  2-9 

2.4  Completeness  2-23 

2.5  Does  the  Universal  Relation  Scheme  Assumption  Hold?  2-25 


3.  Strong  Expression  equivalence 

3.1  Introduction  3-1 

3.1.1  An  Example  3-2 

3.2  Tagged  Tableaux,  valuations  3-3 

3.3  Containment  3-5 

3.4  The  chase  of  a  tagged  tableau.  3-5 

3.5  Results  3-6 

3.8  The  example  revisited  3-9 


4  Functions  in  Databases 

4.1  Introduction  4-1 

4.2  Functional  dependencies  within  one  relation  4-2 

4.3  Functional  dependencies  in  multi-relation  databases  4-15 

4.4  Representation  by  a  schema  4-18 

4.5  Functional  dependencies  as  constraints  .  4-25 

4.6  Other  approaches  4-35 

4.7  Summary  4-37 


5  Maintenance 

5.1  Introduction  5-1 

5.2  A  worst  case  lower  bound  5-1 

5.3  Independence  5-4 

5.3.1  Introduction,  Sagiv  Independence  5-4 

5.3.2  Independence  in  the  general  case  5-7 

5.3.2.1  Weak  Cover  embedding  5-7 

5.3.2.2  Consistency  5-8 

5.3.3.4  Additional  properties  of  independent  schemas  5-25 


6  Summary  and  Future  Work 


6.1  Summary 

6.2  Future  work 


6-1 

8-2 


Appendix  -  Definitions 

A.l  Basics  A-l 

A.2  The  Relational  Algebra  A-3 

A.3  Interpretation  A- 4 

A.4  Dependencies  A-5 

A.5  Some  elementary  notions  of  logic  A-7 

A.6  Dependencies,  again  A-B 


References 


Chapter  1 

An  Historical  Introduction 


1.1.  Pre-History 

By  the  late  1960’s,  computing  equipment  and  applications  had  evolved  to 
the  point  that  it  became  feasible  and  desirable  to  store  and  process  large  quan¬ 
tities  of  data.  It  soon  became  apparent  that  the  intrinsic  value  of  that  data  and 
its  cost  justified  treating  the  data  as  a  resource  or  entity  in  itself,  as  the  hard- 
and  software  which  processed  the  data  were  already  being  treated.  Thus  began 
the  creation,  use  and  study  of  database  management  systems. 

Among  the  earliest  authors  in  the  field,  three  serve  to  illustrate  the  main¬ 
streams  of  activity  over  the  ensuing  decade. 

Charles  Bachman  [Bach],  then  working  for  Honeywell,  designed  a  database 
management  system  called  IDS  which  evolved  into  the  CODASYL  standard 
[C0D78].  These  are  essentially  record  management  systems.  The  details  are  not 
important  to  us,  except  to  note  that  these  and  related  systems1-  account  among 
them  for  almost  all  installed  and  functioning  database  management  systems  and 
almost  none  of  the  academic  literature. 

Mike  Senko  [Senko],  then  of  IBM,  developed  a  system  later  called  DIAM. 
This  system  is  representative  of  the  'semantic  modelling'  or  'knowledge  base’ 
approach  to  database  management.  (It  was  not  the  first  representative  of  the 
approach,  see  also  Langefors  [Lang].)  This  approach  focuses  its  attention  on  the 
'meaning'  of  the  data.  Whereas  the  commercial  systems  concern  themselves 
with  the  manipulation  and  description  of  computerized  data,  the  semantic  sys¬ 
tems  concern  themselves  with  the  description  and  manipulation  of  knowledge. 
There  is  a  great  deal  of  literature  about  semantic  modelling  and  very  few  sys- 

Vich  as  IMS,  TOTAL.  IDM3,  etc. 


1  -  2 


terns  which  utilize  it. 

E.F.  Codd  introduced  the  so  called  ‘relational  model'  in  1970  [Cl].  The  rela¬ 
tional  approach  has  the  distinctive  feature  of  mathematical  tractability.  It  lends 
itself  readily  to  the  statement  of  theorems.  These  are  the  subject  and  object  of 
this  thesis.  Therefore,  we  will  be  using  the  approach  exclusively.  Mathematical 
tractability  is  not  a  necessary  virtue  of  a  database  management  system  nor  is 
mathematical  simplicity  to  be  confused  with  ‘ease-of-use’.  I  do  not  suggest  that 
the  relational  model  is  the  proper  and  correct  way  to  do  database  management. 

A  considerable  debate  raged  through  the  first  years  of  the  1970’s  among  the 
adherents  of  the  commercial,  semantic  and  relational  approaches.  (See  [Rus- 
tin]  and  [KIKo].)  The  debate  had  about  it  at  times  aspects  more  of  a  religious 
war  than  a  scientific  investigation.  One  hears  very  little  of  this  debate  today. 
Such  tempests  can  not  last  forever.  Potential  participants  now  listen  only  to 
their  co-religionists.  Still  these  three  approaches  each  ofer  insights  into  the 
nature  of  the  matter. 

Commercial  systems  manage  data.  These  are  recordings  of  information. 
Knowledge  bases  manage  knowledge.  This  is  somewhat  ephemeral,  existing 
independently  of  any  mechanical  recording.  The  theory  of  databases,  of  which 
this  thesis  forms  a  part,  studies  data  which  function  to  record  knowledge. 
Knowledge  requires  a  'knowing  agent,’  hereinafter  called  the  user.  Once 
recorded,  knowledge  becomes  data.  This  thesis  does  not  investigate  the  mechan¬ 
ics  of  human  knowledge  engineering  nor  the  mechanics  of  computerized  data 
manipulation.  Its  primary  goal  is  to  suggest  a  solution  to  a  serious  flaw  in  the 
theory  and  investigate  consequences  of  that  solution. 

1.2.  History 

In  1972  Codd  published  two  papers  which  between  them  touch  on  all  of  the 


1  -  3 


theory  of  databases.  The  first  of  these,  "Further  normalization"  [C2],  discusses 
aspects  of  database  description.  The  second,  "Relational  completeness"  [C3], 
deals  with  the  manipulation  of  a  database.  The  description  and  manipulation  of 
data  are  exactly  the  subject  matter  of  the  theory  of  databases.  We  will  in  this 
thesis  offer  some  results  in  both  areas  but  description  will  occupy  us  primarily. 

In  [C2]  Cudd  introduced  the  'normal  forms'.  We  will  illustrate  the  concept 
by  example. 

Example  1.1.  Suppose  we  wish  to  record  the  employees  and  managers  of 
each  of  several  departments.  Suppose  we  construct  for  this  purpose  a  rela¬ 
tion  scheme1,  R  -EDM  representing  the  predicate  "employee  works  in 
department  for  manager".  Suppose  also  that  each  employee  works  in  but 
one  department  and  each  department  has  but  one  manager.  That  is,  we 
have  E -*D  and  D->M.  Then  M  is  said  to  ‘transitively  depend'  upon  E\  that  is, 
an  employee  has  a  unique  manager  by  virtue  of  the  unique  department  by 

which  thev  are  linked.  The  scheme  R  is  said  to  violate  3r*  normal  form,  and 
* 

consequently  to  exhibit  the  following  ‘update  anomalies': 

•  the  manager  of  a  department  is  stored  redundantly  with  every 
employee  (maintenance  anomaly) 

•  a  department  cannot  be  created  and  a  manager  assigned  until  an 
employee  is  hired  into  it  (insert  and  delete  anomalies). 

The  recommendation  is  that  R  be  replaced  by  R^ED,  RZ=DM  'without  loss 
of  information.’  ■ 

The  phrase  'without  loss  of  information’  will  occupy  us  for  the  remainder  of 
the  thesis.  We  wish  to  know  what  it  means  for  information  to  be  present  or 
absent.  Before  continuing  the  history  of  this  idea,  we  would  like  to  comment  on 


^ee  the  appendix  for  definitions  and  conventional  notations. 


1  -  4 


the  notion  of  'anomaly*. 

The  progression  of  normal  forms,  which  were  extended  by  Fagin  to  include 
4**  [F2]  and  PJ/NF  (read:  project  join  normal  form*)  [F3]  has  always  been 
justified  as  the  elimination  of  anomalies.  It  is  the  deduction  from  one  piece  of 
semantics,  the  dependency  information,  of  another,  the  isolation  of  unwanted 
behaviour.  Has  it  been  successful?  In  example  1.1,  observe  that  the  points 
highlighted  are  not  consequences  of  the  dependencies.  They  would  be  true  no 
matter  what  the  numerical  relations  of  employees,  departments  and  managers. 
Conversely,  what  justifies  the  belief  that  these  points  are  significant?  Perhaps 
departments  will  never  be  opened  or  closed.  Perhaps  the  need  to  quickly 
retrieve  an  employee,  manager  pair  overrides  the  storage  concerns.  Although  a 
schema  in  PJ/NF  enjoys  certain  pleasing  properties  (see  [F3]),  these  proper¬ 
ties  must  be  traded  off  against  other  properties.  A  ‘correct’  schema  may  not  be 
in  normal  form. 

We  return  to  the  mainline  of  this  history  to  recall  from  example  1.1  our  wil¬ 
lingness  to  conclude  E-*M  from  E  ->D  and  D-*M.  It  seems  incontrovertible  that  a 
function  from  employees  to  departments  composes  with  a  function  from  depart¬ 
ments  to  managers  to  form  a  function  from  employees  to  managers.  This  rea¬ 
soning  is  a  major  tool  in  the  body  of  this  thesis.  In  general,  the  question  may  be 
asked,  from  a  given  set  of  dependencies  asserted  to  be  true,  what  other  depen¬ 
dencies  may  we  conclude  must  also  be  true?  This  is  the  implication  problem. 

The  implication  problem  for  functional  dependencies  provided  the  subject 
for  the  first  important  papers  in  the  theory  of  databases.  Armstrong  [A] 
presented  inference  rules  for  such  implications.  If  F  is  a  set  of  functioned  depen¬ 
dencies,  let  F+  be  the  set  of  all  dependencies  which  are  implied  by  F  according 

*These  forms  fit  a  progression.  A  scheme  in  PJ/NF  is  in  4NF  is  in  3CNF  (read:  3oyce-Codd 

normal  form)  is  in  3NF,  etc. 


1  -  5 


to  Armstrong’s  rules.  Armstrong  proved  his  rules  to  be  complete  by  proving  the 
following  theorem. 

Theorem  1.1.  There  exists  a  relation  instance  r  such  that  the  functional 
dependencies  satisfied  by  r  are  exactly  those  in  F +.  ■ 

From  our  point  of  view,  the  use  of  an  instance  of  a  relation  scheme  on  all 
the  attributes  of  the  dependencies  is  the  crucial  feature  of  this  theorem.  A 
major  result  of  this  thesis  is  that  under  the  notion  we  present  of  a  multi-relation 
database  state  satisfying  a  set  of  dependencies,  Armstrong’s  rules  are  in  a  sense 
incomplete. 

Armstrong's  result  allowed  Bernstein  to  develop  a  schema  design  algorithm 
with  the  following  features:  Given  a  set  F  of  functional  dependencies,  Bernstein's 
algorithm  constructs  a  schema  R  such  that 

1 )  each  scheme  /?€ R  is  in  3rd  normal  form; 

2 )  The  set  of  key  dependencies  of  R,  which  is  the  subset  of  F +  of  the  form 

U  \X->R  j  XQR  \ 

Re* 

is  a  cover  of  F\ 

3)  R  is  minimal,  in  number  of  schemes,  among  all  schemas  satisfying  1  and 

2. 

It  was  the  property  of  being  cover  embedding  which.  Bernstein  argued, 
assured  these  schemas  represented  the  information  to  be  represented.  Fagin 
[Fl]  pointed  out  that  the  design  methodology  employed  by  the  algorithm 
differed  from  that  described  by  Codd.  Of  pre-eminent  concern  to  both  Bernstein 
and  Fagin  was  the  uniqueness  assumption,  the  assertion  that  there  exists  at 
most  one  function  between  any  two  sets  of  attributes.  The  meaning  of  the  E  -*M 
dependenc}'  of  example  1.1  must  be  exactly  the  composition  of  the  meanings  of 
E ->D  and  D-*M.  We  will  be  examining  this  idea  more  closely  and  we  will  discover 


1  -  0 


that  it  may  not  be  as  internally  consistent  as  it  appears.  Fagin  suggested  that 
the  universal  relation  scheme,  the  scheme  containing  all  the  attributes  of  the 
universe,  be  used  as  the  starting  point  of  a  decompositional  approach  to  schema 
design.  Within  the  universal  scheme,  he  argued,  the  uniqueness  assumption  is 
necessarily  valid.  This  truism  we  will  discover  to  be  less  than  completely  true. 

More  directly,  Bernstein's  algorithm  depends  on  Armstrong’s  theorem 
which  in  turn  discusses  instances  of  a  relation  scheme.  Fagin's  comment  is 
redundant  in  that  Bernstein,  whether  he  wished  to  or  not,  was  operating  from  a 
universal  relation. 

It  remained  to  Aho,  Beeri  and  Uilman  [ABU]  to  isolate  a  definition  of  infor¬ 
mation  preservation  with  respect  to  which  Bernstein’s  algorithm  was  deficient. 
We  will  describe  their  idea  with  an  example. 


Example  1.2.  Suppose  that  the  following  relation  instance  gives  a  true  pic¬ 
ture  of  the  relationships  holding  among  individual  employees,  departments 
and  managers. 


Now  suppose  we  wish  to  store  this  information  as  the  state  of  the  schema 
[EM,  DM\.  So  as  to  store  everything  we  know  to  be  true,  we  store 


From  this  state,  answer  this  question:  Which  employees  work  for  which 
departments?  The  answer  we  get  is 

t\ed(EM  *  DM)-[<  1,  1>,  <  1,  2>.  <2,  1>,  <2,  2>]  which  is  not  the  answer  we 
would  have  gotten  from  the  original  instance.  By  presumption,  this  answer 
is  wrong.  » 


E 

M 

1 

1 

2 

1 

Aho  et  al.  defined  a  lossless  decomposition  of  a  relation  scheme  R  to  be  a 


1  -  7 

collection  of  subsets  \R  lt  .  .  .  ,Rn]  of  R  with  the  property  that  for  every  legal 
instance  r  of  R 

T=TtRl(r)  *  •  *  ’  *  7Ti?n(r) 

As  observed  in  [BMSU],  if  R  is  unconstreuned  so  that  all  instances  are  legal,  only 
the  trivial  decomposition  (Rj  has  the  lossless  property.  As  shown  by  Fagln  [F4] 
lossless  decompositions  within  which  each  scheme  is  of  arbitrarily  strict  normal 
formr  are  always  achievable.  It  is  not  always  possible  to  find  such  a  decomposi¬ 
tion  which  is  also  cover  embedding,  as  reported  in  [BB].  (The  canonical  example 
of  this  was  known  to  Codd  in  [C2].) 

Codd  indicated  in  "Further  normalization"  that  the  normalization  process 
for  schemas  is  mirrored  by  the  replacement  of  an  instance  by  its  projections. 
He  expected  this  decomposition  to  be  inverted  by  the  join  operation.  Rissanen 
[Rl]  asked  the  following  question:  Let  SAT(D)  be  the  set  of  all  universal 
instances  (instances  of  the  universal  scheme)  which  satisfy  the  constraints  in  D. 
Let  SAT(R,D )  be  the  set  of  states  of  schema  R  satisfying 

1)  p€.SAT(R,l))  implies  there  is  some  instance  I  of  the  universal  scheme 
such  that  7Tr(/)=/o; 

2)  for  each  R  eR,  p(R  )  satisfies  those  elements  of  D  embedded  in  R. 

SAT(R,D)  is  the  set  of  join  consistent  (part  1),  locally  satisfying  (part  2)  states 
of  R.  The  question  which  Rissanen  asked  is:  Under  what  circumstances  does  a 
bijection  exist  between  SAT(D)  and  SAT  (R,D)7  If  the  operator  taking  SAT(D)  to 
5,A7’(R,Z))  is  projection,  what  is  the  inverse  operation? 

Rissanen,  and  later  Beeri  and  Rissanen  [BR],  showed  that  the  bijection 
exists  exactly  when  R  is  cover  embedding  and  a  lossless  decomposition.  Ke 

called  such  a  schema  a  set  of  independent  components.  Such  a  schema  is  the 

^Such  a  blanket  statement  is  indefensible,  considering  it  includes  forms  not  yet  invented. 

The  statement  is  true  over  the  range  of  forms  up  to  PJ/Nb. 


1-8 


proper  result  of  any  design  process,  claim  the  authors  of  [BR],  as  the  'whole 
point  with  [schema  design]  is,  of  course,  to  be  able  to  replace  the  original 
scheme  with  a  collection  of  the  components  and  not  to  worry  about  the  inter¬ 
component  constraints,  (italics  added)’  Regrettably,  these  authors  have  failed  to 
achieve  that  goal.  The  join  consistency  of  the  elements  of  SAT(R,D)  is  a  powerful 
interrelational  constraint.  In  our  chapter  on  independence,  we  show  that  it  is 
neither  necessary  nor  sufficient  for  a  schema  to  be  cover  embedding  and  loss¬ 
less  for  it  to  fail  to  have  interrelational  constraints. 

Codd  recognized  that  the  decomposed  schema  has  more  ‘admissible’  states 
than  the  original.  (The  admissible  states  to  Codd  are  the  locally  satisfying  ones.) 
This  idea  has  been  formalized  recently  by  Mendelzon  [MendJ.  Among  the  new 
states  are  those  which  are  not  join  consistent.  As  Codd  pointed  out,  'an  obvious 
property  of  the  class  of  admissible  states  ...  is  that  by  means  of  the  operations 
of  tuple  insertion  and  tuple  deletion  all  the  admissible  states  are  reachable  from 
any  given  state.'  This  is  not  a  property  of  the  class  SAT(R,D).  To  move  from  one 
element  of  this  class  to  another  requires  the  simultaneous  insertion  (or  dele¬ 
tion)  of  a  set  of  tuples.  Further,  given  an  element  of  SAT(R,D )  and  a  set  of 
tuples  to  insert,  it  is  difficult,  i.e.,  NP-complete,  to  decide  if  the  result  of  the 
insertion  is  join  consistent  [HLY].  On  the  other  hand,  given  the  definition  of 
admissible  (or  satisfying)  stale  we  will  present,  it  is  possible  to  move  from  any 
admissible  state  to  any  other  through  the  insertion  and/or  deletion  of  single 
tuples  with  each  intermediate  state  being  admissible.  The  difficulty  of  determ- 
ing  membership  in  the  class  of  admissible  states  depends  upon  the  form  of  the 
dependencies.  To  move  from  an  inadmissible  state  to  an  admissible  one  at  least 
one  tuple  deletion  must  occur. 

Consider  example  1.1  again.  If  the  insert/delete  anomalies  of  that  example 
are  to  be  taken  seriously,  it  is  because  the  user  wishes  to  store  departments 


1  -  9 


with  managers  but  no  employees  or  something  of  the  sort.  Among  the  goals  of 
decomposition  is  the  increase  in  the  set  of  admissible  states  to  specifically 
include  states  which  are  not  join  consistent.  A  theory  which  applies  only  to  join 
consistent  states  is  inadequate.  The  central  theme  of  this  thesis  is  the  recrea¬ 
tion  of  the  theory  of  databases  on  a  foundation  which  allows  for  non-join  con¬ 
sistent  states.  The  new  theory  remains  in  the  spirit  of  Codd’s  work  in  "Further 
normalization." 

1.3.  Outline  of  the  thesis 

The  central  concept  of  the  recreation  of  database  theory  presented  by  this 
thesis  is  a  new  definition  of  dependency  satisfaction  for  database  states.  This 
definition  has  its  genesis  in  the  work  of  Honeyman  [H]  and  Vassiliou  [V].  The 
definition  applies  to  arbitrary  states,  while  preserving  an  integrated  view  of 
these  states.  In  contrast,  it  is  possible  to  follow  the  direction  suggested  by  Codd 
and  consider  any  state  satisfying  within  which  each  instance  satisfies  its  ‘local’ 
dependencies.  (These  dependencies  are  those  defined  on  the  relation  scheme  for 
the  instance.)  Such  a  treatment  fragments  the  information  stored  in  the  state.  A 
theory  in  which  local  satisfaction  is  the  definition  of  satisfaction  is  more  prop¬ 
erly  a  theory  of  files  than  a  theory  of  databases.  A  database  system  manages  the 
data  as  a  whole,  not  as  merely  a  disjoint  collection  of  parts. 

In  chapter  2,  we  introduce  the  universal  scheme  assumption.  We  argue  that 
the  universal  scheme  provides  an  appropriate  global  view  of  a  multi-instance 
database  state.  The  isolation  of  the  universal  scheme  assumption  as  a  foundation 
of  database  theory  was  first  done  by  Fagin,  Mendelzon  and  Ullman  [FMU].  They 
consider  only  the  schemes  of  a  database  schema.  The  universal  scheme  serves 
to  integrate  these  schemes.  Fagin  et  ai.  provide  no  means  of  integrating  the 
data  in  a  state  into  some  larger  object.  We  will  take  a  view  of  the  universal 
scheme  which  in  some  sense  is  a  dual  of  the  view  of  [FMU].  Where  those  authors 


1  -  10 


define  the  universal  scheme  as  coming  from  the  schemes  of  the  schema  by  con¬ 
junction,  our  view  is  that  the  decomposed  schemes  come  from  the  universal 
scheme  by  projection.  We  present  this  view  as  a  set  of  first  order  axioms.  We 
claim  this  set  of  axioms  is  the  first  formalization  of  the  universal  scheme 
assumption.  As  the  assumption  is  highly  controversial  [Parker],  a  formalization 
is  called  for. 

We  show  further  in  chapter  2,  that  the  techniques  of  Koneyman  and  Vassi- 
liou  are  natural  consequences  of  this  formalization.  We  restate  their  techniques 
in  the  setting  of  the  predicate  calculus.  Where  their  results  encompassed  func¬ 
tional  dependencies  only,  our  results  cover  the  entire  class  of  implicational 
dependencies  [F4].  Beyond  this  extension  of  their  work,  the  exposition  of 
chapter  2  is  a  compelling  justification  of  it,  which  has  not  existed  heretofore. 

Chapter  2  concludes  by  Introducing  a  notion  of  completeness  for  database 
states,  having  shown  that  logical  completeness  is  inappropriate.  An  informal  dis¬ 
cussion  of  the  applicability  of  the  universal  scheme  assumption  ends  the 
chapter. 

The  remainder  of  the  thesis  examines  problems  in  the  theory  of  databases 
whose  solutions  are  known  when  the  definition  of  satisfaction  includes  join  con¬ 
sistency.  Chapter  3  examines  the  results  of  Aho,  Sagiv  and  U liman  on 
equivalence  of  expressions  in  the  relational  algebra.  For  the  join  consistent  case, 
they  give  a  procedure  which  decides  whether  two  expressions  return  the  same 
result  when  evaluated  in  any  state  satisfying  a  given  set  of  dependencies.  As 
they  had  no  clear  definition  of  satisfaction  for  non-join  consistent  states,  they 
were  uncertain  how  their  results  applied.  We  show  that  their  results  apply  quite 
cleanly. 

Chapter  4  is  concerned  with  deciding  •when  a  dependency  of  the  universal 
scheme  is  'present'  or  'lost*  in  a  database  schema.  We  find  two  different 


1  - 11 


definitions  of  the  presence  of  a  functioned  dependency  in  a  schema.  A  schema  is 
said  to  represent  an  fd  X-*A  when  there  are  states  of  the  schema  which  encode  a 
non-empty  function  from  A'-values  to  A-values.  An  fd  /  is  said  to  constrain  a 
schema  with  respect  to  a  set  of  fd’s  F  (in  which  f  does  not  appear),  when  there 
exists  a  state  of  the  schema  which  satisfies  F  but  not  F  Ul/1*  short,  within  the 
context  of  F,  f  changes  the  set  of  satisfying  states. 

We  show  these  definitions  are  different;  in  particular,  the  second  is  weaker 
than  the  first.  We  cam  demonstrate  when  a  particular  function  is  represented  by 
•  schema.  Mora  significantly,  we  show  that  a  schema  represents  all  of  a  set  F 
and  its  consequences  under  conditions  which  are  strictly  weaker  than  those  of 
Beeri,  Bernstein  and  Goodman  [BBG].  Thus,  in  contradiction  of  a  well  known 
result,  it  is  possible  for  a  schema  to  be  simultaneously  in  BCNF,  lossless  and 
represent  all  of  a  given  set  of  dependencies.  (In  fact,  losslessness  implies 
representation  in  this  sense,  but  not  conversely,  as  we  show.) 

Even  when  a  dependency  is  not  represented  by  a  schema,  it  may  constrain 
the  schema’s  states.  The  recognition  of  this  difference  is  a  significant  result  of 
chapter  4.  Sets  of  necessary  and  of  sufficient  conditions  for  a  given  dependency 
to  act  as  a  constraint  are  given.  Interestingly,  acting  as  a  constraint  is  a  'cover 
sensitive’  property  of  dependencies. 

Chapter  5  takes  up  the  maintenance  of  satisfying  database  states.  This  is 
the  most  significant  practical  issue  in  this  thesis.  The  states  of  a  database 
evolve,  but  each  stage  of  the  evolution  must  be  a  satisfying  state.  Maintenance  is 
the  effort  involved  in  determining  if  the  result  of  modifying  a  satisfying  state  is  a 
satisfying  state.  We  show  that,  in  general,  this  effort  is  as  great  as  the  effort 
needed  to  determine  if  an  arbitrarily  selected  state  is  satisfying.  Which  is  to  say, 
the  worst  case  complexity  of  the  maintenance  problem  is  at  least  linear  in  the 
size  of  the  state  being  modified.  This  result  prompts  the  main  exposition  of 


1-12 


chapter  5,  the  description  of  independence. 

A  schema  is  independent  if  its  set  of  satisfying  states  is  the  set  of  locally 
satisfying  states.  For  these  schemas,  Codd’s  suggested  definition  of  a  satisfac¬ 
tion  coincides  with  ours,  as  these  are  the  schemas  without  interrelational  con¬ 
straints.  The  effort  required  to  maintain  the  states  of  an  independent  schema  is 
independent  of  the  size  of  those  states.  Chapter  5  characterizes  independent 
schemas  when  functional  dependencies  are  the  only  constraints. 


Chapter  2 

Logical  Foundations 

2.1.  The  Universal  Relation  Scheme  Assumption 

Given  that  we  wish  to  extend  the  theory  of  relational  databases  to  database 
states  which  are  not  necessarily  join  consistent,  we  are  faced  with  a  difficult, 
fundamental  question:  What  is  the  information  represented  by  the  data  in  a 
multi-relation  database  state  which  is  not  necessarily  join  consistent? 

This  question  arises  even  when  we  restrict  our  attention  to  join  consistent 
states  only.  However,  for  these  states  it  has  an  obvious  answer.  If  we  know  that 
state  p  of  schema  R  is  join  consistent  and  further  that  the  universal  instance  /  is 
such  that 

/0  =  7TR(/) 

Then  we  can  say  that  the  information  stored  in  p  is  the  instance  I.  Such  a 
definition  gives  strong  motivation  to  the  concept  of  lossless  decomposition.  If  R 
is  not  a  lossless  decomposition  of  the  universal  relation,  there  may  be  more  than 
one  universal  instance  whose  projections  are  p.  Thus  the  information 
represented  by  p  is  ambiguous. 

This  definition  of  the  information  presented  by  a  state  simply  does  not 
apply  if  the  state  is  not  join  consistent.  We  should  ask  ourselves  what  any  such 
definition  should  provide  to  be  acceptable.  We  might  be  content  believing  that 
the  state  provides  no  more  information  than  is  provided  by  the  individual 
instances  in  the  state.  But  we  cannot  deny  that  the  collection  of  instances  taken 
together  provide  more  information  than  this.  Indeed,  the  motivation  for  data¬ 
base  management,  as  opposed  to  file  management,  is  the  unification  of  the  infor¬ 
mation  stored  in  the  individual  files. 


2  -  2 


Example  2.1 ,  Suppose  we  have  the  attributes  Suppliers.  Parts,  and  Cities. 
Suppose  we  have,  in  a  state  of  the  schema  \SP,  SCj,  the  facts  "DEC  supplies 
PDP/ll's"  and  "DEC  is  located  in  Maynard".  Then  we  know,  in  some  perhaps 
vague  sense,  that  PDP/ll's  are  supplied  by  someone  in  Maynard.  That  fact 
is  not  stored  in  the  database.  • 

We  require  a  formal  object  which  allows  us  to  speak  of  the  database  as  a 
whole.  Such  an  object  is  the  universal  relation  scheme,  the  relation  scheme 
containing  all  the  attributes  of  the  universe.  The  universal  relation  scheme 
assumption  is  the  claim  that  this  scheme  has  a  meaning.  It  is  not  clear  how  the 
data  in  the  database  "populate"  this  scheme.  In  other  words,  it  is  not  clear  how 
the  universal  scheme  can  be  used  to  answer  our  question. 

The  isolation  of  the  universal  relation  scheme  assumption  as  separate  from 
and  more  basic  than  the  universal  relation  instance  assumption*  is  due  to  Fagin 
et  al.  [FMU]  Their  own  example  best  illustrates  their  point  of  view. 

Example  2.2.  Let  the  universal  scheme  be  CTRHSG  where  the  meanings  of 
these  letters  are  to  be 

C  is  Course  H  is  Hour 

T  Teacher  51  Student 

R  Room  G  Grade 

The  meaning  of  the  universal  scheme,  which  we  can  take  to  be  the  descrip¬ 
tion  of  the  universal  relation,  is  given  by 

[ ctrhsg  j  t  "teaches"  c 

c  "meets  in"  r  "at  time"  h 
s  "is  getting"  g  "in"  c  J 

From  this  description,  the  "natural"  decomposition  is  \CT,  CRH,  CSG\.  Let¬ 
ting  Px  be  the  predicate  for  CT  (that  is.  Pi(ct)  means  that  teacher  l 
teaches  course  c),  P 2  for  CRH,  P3  for  CSC,  then  this  description  may  be 
rewritten  as 

*Thi  s  is  the  requirement,  that  states  be  join  consistent. 


2  -  3 


[ctrhsg  |  P\(ct)  and 
Pzichr)  and 
^3 (csg)  |  • 

The  approach  of  example  2.2  assigns  to  the  meaning  of  the  universal 
scheme  the  conjunction  of  the  meanings  of  the  schemes  of  the  decomposition. 
As  is  described  in  the  appendix  on  the  relational  algebra,  the  conjoining  of  predi¬ 
cates  is  equivalent  to  the  joining  of  relation  instances.  Following  this  lead,  we 
would  assign  as  the  information  stored  in  a  state  of  \CT,  CRH,  CSC j  the  join  of 
the  instances  of  the  state.  This  approach  is  satisfactory  only  for  join  consistent 
states  and  is  useless  in  solving  our  problem.  We  can  find  a  useful  attack  by 
adopting  a  point  of  view  which  is  in  some  sense  the  dual  of  the  point  of  view  of 
example  2.2. 

Suppose  we  know  that  Corneil  teaches  CS2410.  So  we  have  recorded 
/3i(C5  2410,  Corneil).  Then,  reasoning  from  the  set  of  attributes  in  the  universal 
scheme  (as  opposed  to  reasoning  from  the  nature  of  reality),  we  can  say  that  it 
is  possible  for  CS2410  to  be  assigned  a  meeting  time  and  for  students  to  enroll  in 
it  and  receive  grades.  This  reasoning  can  be  expressed  as  a  first  order  sentence, 
namely, 


.P1(C\S'2410,  Corneil  )^>(:?hrsg  )P(CS  2410,  Corneil,  h.r.s.g  )  (*) 

where  P  is  the  universal  predicate,  the  meaning  of  the  universal  scheme.  We 

must  be  careful  in  interpreting  this  sentence.  In  particular,  we  must  be  careful 

when  instantiating  the  existentially  bound  variables.  The  rationale  for  the 

decomposition,  in  Codd’s  terms,  is  to  allow  the  fact  that  Corneil  Leaches  CS2410 

to  be  recorded  before  the  course  schedule  has  been  made  and  any  students 

have  enrolled.  So  the  existentially  bound  variables  record  the  possibility  of 

expanding  nie  tupie  ■‘Cot)  2^10,  Cornell  of  the  instance  for  P\  in  the  state  to  a 


tuple  of  the  universal  instance.  It.  may  be  in  fact,  that  OS  2410  is  not  to  be  offered 
during  the  lifetime  of  the  database.  Then  we  may  never  instantiate  h,r,s,g  with 


2-4 


actual  hours,  rooms,  students  or  grades. 

The  substance  of  the  sentence  (*)  is  that  the  meaning  of  the  scheme  CT, 
that  is,  the  predicate  P j,  is  consistent  with  the  meaning  of  the  universal  scheme 
CTRHSG,  the  predicate  P.  The  description  of  projection  in  the  appendix  on  rela¬ 
tional  algebra  shows  that  the  predicate  for  an  instance  derived  by  projection 
satisfies  a  sentence  of  the  form  of  (*).  Thus  the  point  of  view  being  taken  here  is 
that  the  predicate  of  a  scheme  in  the  database  schema  is  the  projection  of  the 
predicate  for  the  universal  scheme.  This  projection  of  predicates  is  independent 
of  any  instance. 

We  may  now  put  forward  a  definition  of  the  information  presented  by  a 
multi-relation  database  state  which  is  not  necessarily  join  consistent.  We  associ¬ 
ate  with  a  database  scheme  R  a  many-sorted  language  LR=<P,0(  C>.  A  many- 
sorted  language  is  a  syntactic  variant  of  the  familiar  first-order  languages  of  the 
predicate  calculus.  Within  a  many-sorted  language,  the  variables,  constants  and 
predicate  symbols  are  each  assigned  a  sort.  The  sort  of  a  predicate  symbol  is  a 

k-tuple  of  sorts.  If  predicate  symbol  P  has  sort  <S| . s*>.  then  we  may  write 

P(v\ . vk)  as  a  formula  of  the  language  only  If  has  sort  s<. 

The  sorts  of  Lr  are  the  attributes  of  U.  The  elements  of  Lr  may  now  be 
described. 

1)  P  is  the  set  of  predicate  symbols  of  Lr.  If  R~\R  i,  .  .  .  .  /?&  $  and 

U~[A  1(  .  .  .  ,Ai\,  then  P=[P,Pt . Pk.=A , . =<4,i-  If 

then  the  sort  of  Pi  is  <-4ti,  .  .  .  ,  A^>.  The  sort  of  P  is  <A  .  .  .  ,Ai>.  Thus  Pi 

is  the  predicate  associated  with  P  is  the  universal  predicate.  For  each 


AidJ,  the  predicate  is  the  identity  predicate  of  sort 
2)  L r  contains  no  function  symbols. 


2-5 


3)  C,  the  set  of  constants  of  Z/r,  is  the  set  U  dom(A).  For  c  €dom(A),  the 

AeU 

sort  of  c  is  A.  Since  every  symbol  is  uniquely  assigned  a  sort,  we  must  have 
disjoint  domains. * 

We  stress  that  the  use  of  a  many-sorted  language  is  purely  a  syntactic  dev¬ 
ice.  By  employing  it,  we  restrict  the  formulae  which  we  may  write  down.  In  par¬ 
ticular,  notice  that  there  is  no  way  that  we  may  compare  values  of  distinct  attri¬ 
butes.  We  may  not  write  an  atomic  formula  Pi(t)  unless  t  is  a  tuple  of  symbols  of 
the  sorts  required  by  P*.  From  now  on  we  will  assume  that  the  formulas  we  write 
conform  to  this  syntax.  Therefore  we  will  take  the  liberty  of  dropping  the  sub¬ 
scripts  from  the  equality  predicates. 

Having  constructed  Z,r,  we  begin  to  construct  axioms  in  it.  The  first  set  of 
axioms  are  constructed  directly  from  the  decomposition  R  and  the  attribute 
domains.  They  are  therefore  called  the  schema  axioms.  They  are  of  two  kinds. 

1 )  The  containing  instance  axioms.  For  each  Pt€ P  we  have* 

(Vt  )P.(t)*(3u)P(ut) 

2)  The  distinct  constant  axioms.  For  each  attribute  A,  each  pair  of  values 
b,  c  £cfom (.4  ),  the  inequality 

b  *c 

The  containing  instance  axioms  are  a  formalization  of  the  universal  schema 
assumption.  They  represent  the  only  such  formalization  yet  attempted.  The  dis¬ 
tinct  constant,  axioms  prevent  any  two  constants  from  being  equated.  It  is  not 
uncommon  in  the  database  context  for  two  distinct  values  to  reference  the  same 
thing:  an  item  having  distinct  item  numbers  in  different  divisions  of  a  company. 

The  distinct  constant  axioms  assert  that  the  distinct  item  numbers  are  distinct. 

*Which  means  nothing  more  than  that  employee  number  1  is  different  than  bank  account 
number  1. 

have  begun  abusing  the  many  sorted  notation.  Thus  u  is  a  tuple  on  the  sorts  U  —Ri  and 
the  tuple  ut  is  assumed  to  be  written  in  the  proper  order. 


8-8 


The  schema  axioms  are  derived  solely  from  the  schema  R  and  the  domain 
definitions.  From  a  database  state  p  we  construct  a  set  of  axioms  to  be  denoted 
Ap.  The  sentences  \nAp  are  of  two  kinds. 

1)  The  schema  axioms  for  R,  as  above. 

2)  The  database  instances.  For  each  tuple  t  in  each  relation  p(Ri),  the  atom 
or  "ground  instance" 


Pi(t ) 

Definition,  The  information  stored  in  a  database  state  p  is  the  set  of  all 

models  of  the  theory  Ap.  ■ 

Wheni4p  is  given  by  1  and  2.  it  is  consistent  and  therefore  has  a  model.  The 
next  section  introduces  constraints  into  Ap.  Certain  states  will  generate  incon¬ 
sistent  sets  of  axioms  and  thus  store  no  information.  However  Ap  when  it  has  a 
model  will  have  infinitely  many  models  of  arbitrarily  large  cardinalities?  This 
definition  seems  to  bring  us  more  than  we  bargained  for.  This  phenomenon  is  not 
restricted  to  database  theory. 

Example  2.3.  As  in  example  2.1,  consider  the  state  p  given  by 


r  i 

Supplier 

Part 

t-2 

Supplier 

City 

DEC 

PDP/1 1 

DEC 

Maynard 

A  model  otAp  is  given  by 

Mx:  Pt(DEC, PDP/1 1) 

P2(DEC,  Maynard) 

P(  DEC ,  PDP/1 1 ,  M  ay  n  ar  d ) 

Thus  M i  demonstrates  that  PDP/ll's  are  supplied  from  Maynard.  But  we 

also  have  as  a  model  of  Aa 

Mz:  P  i(DEC.PDP/ll) 

P2(DEC,  Maynard) 

P{  DEC, PDP/l  1, Palo  Alto) 

P(DEC,HP3000,  Maynard) 

Although  \fz  is  a  model  ofi4p.  it  is  not  a  model  of  reality.  In  some  sense  this 
is  unavoidable.  A  database  management  system  which  stores  only  models  of 
reality  would  require  super-human  intelligence,  as  human  beings  are  quite 


2-7 


capable  of  believing  false  things.  The  techniques  of  the  next  section  will 

eliminate  Mz  as  a  model.  However  they  will  not  eliminate 

A/a:  PZ(DEC, Maynard) 

P2(IBM,Sttii  Jose) 

P2(HP,Palo  Alto) 

P2(Goblotny,Zlorch) 

...  and  so  forth 

We  might  consider  eliminating  some  of  the  nonsense  from  M 3  (i.e., 
■P2(Gobolotny,Zlorch)  and  P 2(0,/?))  by  including  constraints  with  the  effect 
of  allowing  as  models  only  structures  constructed  from  constants,  the  set  C 
of  L  R.  Such  axioms  would  limit  the  models  to  be  of  no  greater  than  count¬ 
able  cardinality.  This  cardinality  restriction  can  not  be  accomplished; 
therefore,  such  constraints  cannot  be  written. 

Even  when  we  restrict  our  attention  to  those  models  constructed  solely 
from  constants,  we  do  not  eliminate  P2(IBM,San  Jose)  and  /32(HP,Palo  Alto) 
from  Mq.  These  facts  are  not  asserted  by  p.  We  will  return,  in  the  section  on 
completeness,  to  some  thoughts  on  how  the  excessive  information  in  some 
of  these  models  can  be  confronted.  ■ 

2.2.  Dependencies 

In  the  preceding  subchapter  we  associated  with  a  database  state  p  a  set  of 
first  order  axioms  Ap.  These  axioms  were  "unconstrained"  in  the  sense  that  for 
every  state  p,  A  p  is  consistent.  A  valuable  service  of  any  database  management 
system  is  that  it  protect  the  data  from  corruption.  The  system  should  be  capa¬ 
ble  of  recognizing  certain  states  as  inadmissible  and  refuse  them  admission.  We 
therefore  add  to.4p  a  set  of  first  order  statements  called  constraints.  These  are 
any  set  of  statements  with  the  property  of  not  being  tautological. 

Presumably  these  constraints  are  added  to  Ap  by  the  user  and  presumably 
they  reflect  properties  of  the  world  of  interest  to  the  user.  Our  interest  is  in 
those  constraints  which  have  been  called  data  dependencies,  particularly  in  the 


2  -  8 


"classic”  dependencies:  the  functional,  multivalued  and  join  dependencies.*  We 
remark  in  passing  that  the  data  dependencies  are  simultaneously  too  powerful 
and  too  weak  to  serve  as  the  only  source  of  user  supplied  constraints.  Their 
weakness  is  illustrated  by  their  inability  to  express  the  constraint:  "the  sum  of 
the  salaries  in  a  department  may  not  exceed  the  budget  for  the  department." 
The  excessive  power  of  the  dependencies  to  express  properties  of  no  conceivable 
Interest  is  argued  convincingly  by  Sciore  [Sc].  In  spite  of  that,  historical  eon- 
cerns  alone  would  justify  our  interest  in  the  class. 

From  now  on  then,  the  set  of  axioms  Ap  which  are  associated  with  the  state 
p  is  partitioned  into  three  parts. 

1 )  The  schema  axioms. 

2)  The  ground  instances  (the  database  state). 

3)  The  set  of  dependencies,  written  as  statements  about  the  universal  rela¬ 
tion. 

Parts  1  and  2  of  this  definition  are  explained  in  the  previous  subchapter. 
Observe  that  the  classic  dependencies  are  defined  within  the  context  of  a  single 
relation.  For  example,  the  functional  dependency  "Employee  functionally  deter¬ 
mines  Salary",  as  a  statement  about  the  real  world,  asserts  that  every  employee 
has  a  single  salary.  As  a  statement  about  databases,  the  fd  EMP  -»SAL  asserts 
that  any  two  tuples,  of  some  underlying  relation,  which  agree  on  EMP  agree  on 
SAL. 

Formally,  as  a  statement  about  the  universal  scheme  where  the  universe 
U  =  [X,  Y  ,Z\,  the  fd  X ->  Y  is  written 

( Vny  j y  2z  ,  z  2)P  (xy  x  z  x)  A  P  (xy  2z  2)=>y  i  =  y  2 
Similarly,  the  multivalued  dependency  Y  in  the  same  universe  is  written 


*Scc'thc  appendix  for  definitions. 


2  -  9 


(Vary  {y2z \ZZ)P{xy  {z  i)  A  P  (xy  2z z):>P  (xy  2) 

If  we  had  not  taken  the  approach  of  adding  to  the  language  Lr  a  universal  predi¬ 
cate  symbol,  we  might  have  no  means  of  expressing  these  dependencies.  In  the 
presence  of  X-*-*Y,  it  is  customary  to  decompose  U  into  [XY.XZ],  leaving  no 
scheme  in  which  the  mvd  is  meaningful.  By  restricting  the  dependencies  to  the 
universal  scheme,  we  establish  that  each  dependency  has  a  single  meaning.  Thus 
if  two  or  more  relation  schemes  include  XY,  we  define  the  fd  X-*Y,  if  at  all,  only 
in  the  universal  scheme.  As  becomes  clearer  in  later  chapters,  this  ensures  that 
all  schemes  containing  XY  embody  the  same  function  from  X  values  to  Y  values. 
This  is  the  so-called  "uniqueness  assumption"  of  Bernstein  [B],  itself  supposedly 
a  consequence  of  the  universal  relation  scheme  assumption  [FMU].  That  the 
uniqueness  assumption  is  a  theorem  of  Ap,  lends  strength  to  our  claim  that  we 
have  formalized  the  universal  relation  scheme  assumption. 

2.3.  The  Weak  Instance,  tableaux  and  the  chase. 

Given  the  development  to  this  point,  the  proper  definition  of  a  satisfying  or 
admissible  state  is  obvious. 

Definition.  A  database  state  p  is  satisfying  if  the  theory  A  p  is  consistent, 

that  is,  when  A p  has  a  model. 

In  this  section  we  turn  our  attention  to  the  development  of  a  procedure  for 
deciding  if  a  given  state  is  satisfying. 

We  recall,  from  the  appendix,  that  a  model  for  Ap  is  a  mapping  or  assign¬ 
ment  of  values  to  the  symbols  of  A  p\  i.e.,  the  constants  are  given  values;  the 
predicate  letters  are  assigned  to  relations,  and  so  forth.  Such  an  assignment  is 
called  an  interpretation.  We  say  that  a  model  M  for  Ap  is  standard  if 

1)  M  interprets  each  constant  as  itself. 


**10 


M)  M  Interpret*  each  P\  aa  p(/!«). 

3)  M  Interpret*  P  (the  universal  predicate)  m  a  subset  of 
dom(i4|)x  •  •  •  xdom{Ai)  (where  U =(A  j . Ai \). 

Lemma  2. 1.  For  any  p,  if  A^  has  a  model  it  has  a  standard  model 

Proof.  Suppose  M  is  a  model  of  A^.  We  denote  by  over-baring  the  interpreta¬ 
tion  function  of  M.  Thus  for  each  ecC  (C  the  set  of  constants  of  L *),  its  interpret 
tation  in  M  is  c\  similarly,  the  predicate  Pi  is  assigned  to  the  relation 

We  form  from  M  a  structure  M'  which  satisfies  1 ,  2,  3  and  then  show  M'  to 
be  a  model  for  A^.  For  every  tuple  <cj,  .  .  .  ,  c*  >€/>(/?«),  contains  the  tuple 

<C| . et>.  Discard  from  f\  any  tupla  which  la  not  of  this  form.  Lit  Jf"  he  the 

result  of  reducing  each  Pj  In  this  way.  Clearly  AT'  is  a  model  of  Ap. 

By  the  distinct  constant  axioms,  the  interpretation  of  constants  is  inverti¬ 
ble  (i.e.,  it  is  one-to-one).  By  the  Lowenheim-Skolem  theorem  [Enderton],  we 
may  assume  the  domain  of  M,  which  is  also  the  domain  of  M''  and  which  is 
denoted  Am ,  is  countable.  Let  t  be  any  injection  i:A&f->C  which  extends 
fi(F)=c  |  c  €CJ.  As  M'  is  isomorphic  to  Af  ",  W  is  a  model  of  Ap.  * 

Clearly  the  decision  problem  for  satisfaction  reduces  to  the  problem  of 
finding  an  instantiation  for  the  universal  predicate.  In  other  words,  we  need  to 
discover  whether  there  exists  an  instance  vu  over  U  such  that 

W1 )  TTRi(w)^p(Ri)  and 

W2)  w  is  a  model  of  the  dependencies  in  A p;  i.e.,  w  is  satisfying  In  the  old 

sense. 

Property  If  1  is  a  consequence  of  the  schema  and  database  axioms.  Property  if  1 
is  the  justification  for  the  name  "containing  instance  axioms".  Any  universal 
instance  which  satisfies  If  1  is  called  a  containing  instance.  As  we  have  observed, 
every/?  has  a  containing  instance. 


2  -  11 


An  instance  satisfying  both  W 1  and  W 2  is  called  a  weak  instance  for  p.  From 
now  on  we  will  use  the  phrases  "A  p  is  consistent,"  'Ap  has  a  model,"  "p  is  satisfy¬ 
ing"  and  "p  has  a  weak  instance"  interchangeably. 

The  definition  of  satisfaction  for  a  state  in  terms  of  the  existence  of  a  weak 
instance  for  it  is  due  originally  to  Koneyman  [K],  An  independent,  equivalent  for¬ 
mulation  can  be  found  in  Vassiliou  [V].  Both  authors  consider  only  functional 
dependencies  as  constraints.  Neither  author  supplies  a  justification  of  their 
definitions  in  terms  as  compelling  as  the  definition  given  above.  That  definition  is 
applicable  no  matter  what  set  of  dependencies  appears  in  Ap.  Some  non-trivial 
consequences  of  this  extension  are  discussed  in  the  next  section. 

The  algorithm  presented  by  Honeyman  for  testing  satisfaction  operates 
when  only  functional  dependencies  are  present.  We  will  show  that  when. 4  p  con¬ 
tains  only  finitely  many  dependencies  all  of  which  are  total,  then  Koneyman’s 
procedure  will  always  terminate  correctly.  If  however  some  of  the  dependencies 
are  partial,  this  procedure  may  fail  to  terminate.  We  will  show  that  the  pro¬ 
cedure  can  be  used  as  the  basis  of  a  semi-decision  procedure  for  the  comple¬ 
ment  of  the  decision  problem;  i.e.,  the  set  of  non-satisfying  database  states  is 
recursively  enumerable  even  in  the  presence  of  partial  dependencies.  The  deci¬ 
dability  of  the  satisfaction  problem  in  the  presence  of  partial  dependencies  is  an 
open  problem. 

Honeyman’s  procedure  is  an  application  of  the  chase.  The  chase  was  origi¬ 
nally  introduced  by  Aho  et  al.  [ABU]  and  has  seen  wide  application.  Among  many 
others,  the  chase  is  discussed  in  [MMS]  [ASU].  We  give  a  variant  of  the  chase 
suiTicient  to  serve  as  the  procedure  for  deciding  satisfaction.  A  more  general 
variant  is  given  in  the  next  chapter. 

For  a  given  state  p  of  some  schema  R,  the  tableau  for  p,  denoted  Tp,  is  a 
relation  on  the  universe  of  attributes  of  R  over  an  expanded  set  of  domains.  For 


2  -  12 


each  attribute  A,  the  tableau  domain  of  A,  tdom(A)=dom(A)\jNdv(A)  where 
Ndv(A)  is  a  countable  set  disjoint  from  dom(A)  whose  members  are  called  non- 
distinguished  variables  or  ndv's.  We  impose  a  partial  order  <  on  tdom  (A  )  such 

that 

•  for  c  €.dom(A  ),  b  €.Ndv  (4 ),  c  <6 

•  <  is  a  well  order  for  Ndv  (A ) 

By  omission,  the  elements  of  dom{A)  are  incomparable  under  <.  W®  defln®  T0  a® 
follows; 

For  every  relation  p(P4)  and  every  tuple  t  €p(Ri)  there  is  a  row  u  of  T 0  with 

•  u  [/?<]=< 

•  u[B  ]€Ndv  (B  )  appearing  nowhere  else  in  Tp  for 

Observation  1 .  Tp  is  a  containing  instance  for  p.  That  is,  hr  (Tp)2p(Ri)  for 
every  R{e R. 

Observation  2.  The  structure  for  L®  given  by 

•  isp(Pi) 

•Pi s  Tp 

•  every  constant  of  Lr  is  mapped  to  itself 

is  a  model  of  the  schema  and  database  axioms  of  Ap. 

These  tv.ro  observations  are  equivalent  reformulations  of  the  same  idea.  We 
now  describe  the  process  of  chasing,  which  is  applied  to  tableaux. 

The  chase  is  a  process  for  converting  a  tableau  to  one  which  satisfies  a 
given  set  of  constraints.  Y*rc  recast  dependencies  in  tableau  form.  Every  depen¬ 
dency  d  may  be  coded  as  a  pair  <Lhs  (d),rhs  (d)>  in  the  following  way.  Let 

\P(y . . vk),  .  .  .  ,  P  (w  i . vuk )  j  be  the  matrix  of  the  left  hand  side  of  d ;  that 

is,  the  set  of  atomic  formulae  whose  conjunction  forms  the  quantifier  free  por- 


2  -  13 


tion  of  the  left  hand  of  the  implication.  Then  lhs(d)  is  the  tablean 


A  1 

•  •  • 

Vl 

• 

vk 

• 

• 

U>1 

• 

wk 

If  d  is  a  tuple  generating  dependency,  then  rh.s(d)  is  the  tableau  formed  from 
the  matrix  of  the  right  hand  of  the  implication.  If  the  right  hand  of  d  is  the 
equality  x*=x;,  then  rhs(d)  is  the  equality  assertion  ”Xi=Xj”.  It  is  assumed  that 
all  of  the  symbols  of  lhs(d),  rhs(d)  are  non-distinguished  variables  of  the 

proper  sorts. 

A  transformation  is  a  pair  <d,m>  where 
d  is  a  dependency  otAp 
m  is  a  function  on  tableau  symbols 
The  transformation  <d,m>  is  enabled  for  a  tableau  Tp  if  m  satisfies 

rn(lhs  (d  ))C  Tp 

where  m  is  the  tuple  and  set  extension  of  m;  that  is. 
m{t  )  =  <m(t  [A  i])  •  •  •  77i (f  [j4*])>  and  m(\t  j,  .  .  .  .  ti\)=[rn(t  i) . 

The  tableau  <d,m>(Tp),  if  <d,m>  is  enabled  for  Tp,  is  given  by  the  following 

scheme: 

•  If  d  is  an  egd  with  rhs  (d)="xi=Xj"  then  <d,m>(Tp)  is  Tp  with  all 
occurrences  of  m(xi).  m(xj)  replaced  with  the  lesser  of  the  two  values  with 
respect  to  <.  Such  a  value  will  not  exist  exactly  when  m(xi),  Tn(xj)  arc  dis¬ 
tinct  constants.  In  this  case  <d,m  >(Tp)  =  $  and  <d,m>  is  said  to  be  contrad¬ 
icted  or  a  contradiction. 

•  if  d  is  a  tgd  then  <d,m>(7'p)  is  Tp\jS  where  51  is  given  by 


7Ti  (rhs(d)) 


2  -  14 


If  d  is  partial,  rhs(d)  contains  symbols  not  in  lhs(d).  These  are  the  existen¬ 
tially  bound  variables  of  the  dependency.  Assume  m  maps  each  of  these  to 

a  unique  ndv  not  appearing  in  T p. 

Let  r  be  a  transformation.  We  say  t  is  an  egd  (or  tgd)  if  the  dependency  in  r 
is  an  egd  (or  tgd).  Similarly,  we  describe  a  transformation  as  partial  or  total 
according  to  whether  its  dependency  is  partial  or  total. 

A  row  of  T p  is  said  to  correspond  to  the  row  of  r{Tp)  which  it  has  become  by 
the  action  of  r.  If  t  is  a  tgd,  not  every  row  of  r(Tp)  corresponds  to  a  row  of  T  p. 

A  sequence  of  transformations  Cr^  .  .  .  ,  rp>  is  said  to  be  syntactically  valid 
for  Tp  if 

•  Tj  is  enabled  in  T p 

•  t<  is  enabled  in  7\_i(  ■  •  •  ( rx{Tp ))  •  •  •  )  for  every 

If  x=<T!,  .  .  .  ,Tp>,  then  we  define  x(Tp)  =  Tp(Tp-d  '  '  '  ri(7’p))  '  '  '  )•  Correspon¬ 
dence  between  rows  may  be  transitively  extended  across  transformation 
sequences.  The  relation  of  correspondence  between  a  row  of  Tp  and  a  row  of 
X\T p)  is  functional. 

The  completion  of  Tp  is  Tp=x(Tp)  for  a  syntactically  valid  sequence  x  pro¬ 
vided  no  application  of  a  transformation  to  Tp  results  in  a  change  to  it.  If  Ap  con¬ 
tains  partial  dependencies,  x  may  be  infinitely  long  and  T*p  infinitely  large.  In  the 
case  A p  contains  no  partial  dependencies,  however,  we  can  show  that  Tp  is  finite 
and  uniquely  defined.  This  is  equivalent  to  showing  that  the  chase  has  the  finite 
Church-Posser  property,  defined  momentarily.  We  do  this  by  applying  a  special 
case  of  a  theorem  due  to  Sethi. 

A  TPpJ a c F.m.Fn.t  system,  is  a  ser.  of  objects  0  and  a  set  of  functions 

«> 

\f  \f’-0~*0\.  We  write  m^n  if  there  exists  a  function  /,  f(m)=n.  is  the 
reflexive  transitive  closure  of  =>.  An  object  rn  is  irreducible  if  there  is  no  object  n 


2  -  15 


such  that  m  =>7i.  =>  is  the  completion  of  =>.  Thus  m=t>n  iS  m=>n  and  n  is  irredu¬ 
cible.  A  replacement  system  is  finite  if  there  is  a  maximum  number  of  moves 
(applications  of  functions)  which  can  be  made  starting  at  any  object.  A  replace- 

ment  system  is  Church-Rosser  if  =i>  is  a  function;  that  ism=>n  and  m=i>p  only  if 

n -p. 

Proposition  2.1.  [S]  A  replacement  system  is  finite  Church-Rosser  if  it  is 

finite  and  for  all  objects  m,  n  and  p:  m$>n  and  m^p  implies  there  exists  an 

•  # 

object  q  such  that  n^>q  and  p  =£>?.  ■ 

We  begin  by  showing  that  the  chase  is  a  finite  procedure  when  only  total 
dependencies  are  present. 

Lemma  2.2.  Tf  p  is  finite  and  all  of  the  dependencies  in  Ap  are  total,  then 
there  exists  a  finite  sequence  of  transformations  x  such  that  x(Tp):=T'p. 

Proof.  This  was  originally  proven  as  lemma  3  of  [MM3]  in  a  different  setting. 
Since  p  is  finite,  T p  is  finite.  Since  all  the  dependencies  in  Ap  are  total,  none 
introduce  new  symbols  in  T p.  There  are  only  finitely  many  relations  which  may 
be  constructed  with  the  symbols  in  Tp.  We  claim  that  if  v  is  any  transformation 
such  that  t(T)*T  for  any  tableau  T,  then  there  exists  no  sequence  of  transfor¬ 
mations  £  such  that 

If  t  is  an  egd,  T  contains  a  symbol  not  in  t(T)  and  no  transformation  of  { 
will  re-introduce  it.  If  r  is  a  tgd,  the  symbols  of  T  and  t(T)  are  the  same.  If  ( 
contains  only  tgd’s,  then  ((r(T))  contains  rows  not  in  T.  If  £  contains  an  egd, 
some  symbol  of  T  does  not  appear  in  £{t(T)). 

If  \(TP)=T*P.  where  x-<ri,  r2,  •  ■  •  >  is  not  finite,  we  may  eliminate  from  x  all 
transformations  rx  where 

-<(-<-.(  '  •  •  (t,(7-p))  ■  •  ■  ))  =  Tt_,(  •  •  ■  (T,(r,))  ■  •  ■ ) 

If  £  is  the  resulting  sequence,  it  is  syntactically  valid  and  X(Tp)=x(Tp)=  Tp.  From 


2  -  18 


the  claim  we  know  that  £  is  only  finitely  long.  ■ 

From  lemma  2.2  we  know  that  if  we  apply  the  chase  procedure  to  a  tableau 
in  such  a  way  that  every  transformation  has  an  effect,  then  after  finitely  many 
such  steps  we  will  have  calculated  the  completion  of  the  tableau.  We  assume 
from  now  on  that  the  chase  is  always  carried  out  in  this  way. 


We  apply  Sethi’s  theorem  by  showing  that  transformations  "almost”  com¬ 
mute. 

Lemma  2.3.  For  any  two  transformations  rlt  r2  enabled  for  a  tableau  T,  if 
T\(T),  tz(T)  are  both  not  empty,  then  there  exist  transformations  r2'  such 
that 


^*i'(t2(7' ) )  =  Tg^Tj (  F  )) 

Proof.  Let  T\-<d,m>,  rz  =<e,n>.  There  are  three  cases  to  consider. 


Case  1.  Both  d,  e  are  equality  generating  dependencies.  Let  Sd  be  the  set 
of  symbols  appearing  in  the  rows  of  d,  S%  the  symbols  of  e.  Let  rhs  (d)  =  "Xi~Xj>\ 
rhs{e)-"X)C=xi".  Let  Yd-\y  \y£Sd  and  either  n(xk)=m(y)  or  n(xi)~m(y)\. 
Define  Y%  similarly.  Let  tx'=  <d,m’>  where 


m  '(2  )= 


771(2 ) 

min<7i(2*),n(:r2)j 


for  2  £Yd 
for  2  £  Yd 


(Note  that  Yd-<f)  implies  T\-T\.)  Define  t2’  symmetrically. 


Clearly  r{(rz(T))  and  t2'(t1(7’))  differ  only  at  the  images  of  the  s’s  from  T. 
Likewise  it  is  clear  that  for  any  tcT,  if  t  [A  ]=o(2)  where  o  is  one  of  m,  n  and 
z€\xi,  Xj.Xk.Xi],  then  for  the  tuple  t '  corresponding  to  t  in  either  t1'(t2(7'))i 
rg'(rt(r)) 


if  this  symbol 


t  ’[A  ]  =  min[m  (xi)  ,m  (xj),n  (xk),n  (xi)\ 
is  defined.  If  this  minimum  is  not 


defined  then 


-  /T\\_ 

'  1  \  —  ‘  kj  V  '  2\. 


8  -IT 


Cast  i.  d  la  equality  generating,  •  la  tuple  generating.  Cbooee  Te 

form  T|',  tuppoae  rks  (d)*"xi*Xf'r.  Let  5  be  the  set  of  symbols  appearing  In  Um 
rows  of  {fu(«).  If  |  «€5)xf  then  Tj’ara.  Otherwiee,  let 

Y=\  y  |  y£S  and  either  m(X|)sn(y)  or  m(*j)=n(y) j.  Let  t§’*<#,w’>  where 
n’*n  for  all  symbols  of  lhs(t)  not  in  Y  and  n’(y  )=minfm  (xt).m(xj))  for  y€ Y.  Tj 
is  clearly  enabled  in  rz(T).  T\  equates  \m(xi),m(xj)]  everywhere  they  appear  in 
Tq(T).  The  symbols  of  Lhs(t)  which  are  equated  by  Tj  are  equated  by  n'.  So  t8’  is 
enabled  in  T|(r)  and  rt(Ti(T))^Ti(T^T )). 

Case  S.  Both  d  and  e  are  tgd's.  Then  we  may  choose  fj'»T|,  • 

To  finish  the  proof  that  the  completion  T*fi  of  Tp  is  unique,  we  must  deal 
with  the  empty  set.  We  may  apply  Sethi’s  theorem  to  the  situation  of  lemma  2.3 
in  the  case  that  T1(7,/,)  =  Te(7’p)  =  0.  For  the  remainder  we  have 

Corollary.  If  t2(7’p)*0  and  t1(7’p)  =  01  then  t{{t2(T p))  =  0. 

Proof.  Letting  Ti  =  <d,m>  it  must  be  that  d  is  an  egd  and  that  m{x\),m.(x]) 
are  distinct  constants.  If  t1'=t1  this  condition  persists.  If  T1,  =  <d.m’>  and  m'  is 
distinct  from  m,  then  7n,(xi)  =  minfn(yi.),n(yi)j  as  in  case  1.  But  since 
m(xi)e[n(yk),n(yi)]  in  this  case  and  m(xi)  is  a  constant. 
m(x<)=min|n(y*),n(yi)|.  That  is,  m  »m  and  T|**Tj.  • 

Thtxrrwm  g.f.  If  Ap  has  only  total  dependencies,  the  completion  T*p  of  Tp  is 
unique. 

Proof.  By  proposition  2.1,  lemmas  2.2,  2.3  and  its  corollary.  ■ 

Theorem  2.2.  A  non-empty  state  p  is  satisfying;  that  is,  Ap  is  consistent; 
that  is,  p  has  a  weak  instance;  if  and  only  if 

Proof.  [If)  Suppose  T**<p.  Then  we  claim  T *  is  a  weak  instance  for  p f.  As 


We  are  not  concerned  that  T *  contains  elements  outside  the  domains  of  R  Since  T*  is  only 
countably  large,  we  may  map  Tp  via  an  Injection  to  an  instance  of  the  universal  relation  in 
which  only  constants  appear. 


2  -  18 


noted  in  Observation  1,  T0  is  a  containing  instance  for  p.  The  same  is  true  for  7^. 
It  remains  to  show  that  Tp  satisfies  the  dependencies  of  Ap, 

Suppose  it  does  not.  Then  some  dependency  d  in  Ap  is  not  satisfied  by  Tp 
and  some  minimal  set  wCTp  witnesses  that  fact.  The  transformation  < d, m > 
where  fn(lhs (d ))=ui  is  enabled  in  T0  and  by  definition  <d,m>{Tp)^Tl  which  con¬ 
tradicts  the  definition  of  T\. 

(Only  if)  Suppose  7^=0.  We  must  show  that  .dp  has  no  models;  i.e..  p  has  no 
weak  instance. 

Suppose  w  is  a  weak  instance  for  p.  Let  g  b©  a  function  from  y  tdom{A  )  to 

AeU 

the  symbols  of  w  such  that  g  is  the  identity  on  constants  and  g‘(T0)Qw.  Such  a 
function  must  exist,  by  w  a  containing  instance. 

Now  suppose  <d,m>  is  a  transformation  enabled  in  Tp.  Then  <d,gsm>  is  a 
transformation  enabled  in  w.  (That  is,  pXm(f/is(d)))Cu;.)  Since  tv  ii  a  weak 
instance  for  p,  i.e.,  u>  satisfies  the  dependencies  in  Ap,  <&,§ There¬ 
fore,  y“(<c(,m  >(Tp))£w.  By  induction,  this  reasoning  can  be  extended  to  show 

that  for  any  syntactically  valid  transformation  sequence  x> 

If  Tp=(f),  then  there  is  some  transformation  sequence  t  an^  transformation 
t  such  that  Z(Tp)*<p,  t(£(7’p))  =  0.  Thus  t  is  <e,n>  where  e  is  an  egd  with 
rhs(e)="Xi=Xj"  and  n(xi),  n(xj)  distinct  constants.  From  the  definition  of  g, 
g(n(xi))l  g(n(ij))  are  distinct  and  since  ~g(n(lhs(e)))<Lw,  w  does  not  satisfy  e. 
Therefore  uj  is  not  a  weak  instance.  ■ 

From  our  results  so  far  we  have  immediately: 

Theorem  2.3.  For  finite  p  and  finitely  many  total  dependencies,  the  chase  is 
a  decision  procedure  for  satisfaction.  ■ 

The  reduction  of  Tp  to  0  by  a  transformation  sequence  can  be  recon¬ 
structed  as  a  proof  of  the  inconsistency  of  Ap  by  interpreting  the  ndv’s  as 


2-19 


existentially  bound  variables. 

Proposition  2.2.  Let  61 . 6*  be  the  ndv’s  appearing  in  x(Tp)  for  some 

finite  sequence  \ .  Let  1 1#  .  .  .  ,  be  the  rows  of  x(rp).  Then 

(3*i  •  •  •  6t)(P(«,)A  •  •  •  AF((J) 

Proof.  We  proceed  by  induction  on  the  length  of  x- 

Basis.  For  x  of  length  0.  the  hypothesis  follows  by  the  schema  axioms. 

Induction.  Assume  the  hypothesis  holds  for  all  transformations  of  length  k 
and  let  x  he  of  length  Jfc  +  1.  If  <d,m>  is  the  final  transformation  of  x*  than  wa 
have,  for  c  j.  .  .  .  .  c*  the  ndv*s  of  m  ( Ihs  (d)) 

Aph~(Bc1  •  •  •  cl)(P(u1) A  •  •  •  AP(itn)) 
for  each  Ui€rn(lhs(d))  by  hypothesis.  Therefore 

— (3c !  •  •  •  cL,cl  +  l  •  •  •  cl+j)(P(ui) A  ■  •  •  AP(iin)AP(v1)A  •  •  •  AP(vp)) 
when  d  is  tuple  generating,  where  fn(rhs(d))  =  [v  i  •  •  •  vp]  and  j  symbols  of 
rhs(d)  do  not  appear  in  lhs(d),  by  the  axiom  for  d  inAp  and  modus  ponens.  If  d 
is  equality  generating,  then  we  complete  the  induction  by  deAp,  modus  ponens, 
and  the  substitutivity  of  equality.  ■ 

This  proposition  applies  to  all  finite  chase  sequences,  including  those  built 
with  partial  dependencies.  We  demonstrate  below  that  only  finite  sequences 
need  be  considered  for  inconsistent  Ap.  Pending  this  demonstration,  let 
t(C(Tp))  =  0.  {(Tp)*<p  and  f  finite,  n  an  egd.  Then  by  proposition  2.2,  Ap\ — c<  =  cy  for 
distinct  constants  ct,cj.  But  c^Cj  is  an  axiom  of  A  p.  So/lp| — Ci  =  Cj  Ac+*Cj  andAp 
is  inconsistent. 

Now  let  us  suppose  that  A p  contains  some  partial  dependencies.  In  particu¬ 
lar,  suppose  d  is  such  a  dependency  and  <d,m>  is  enabled  in  Tp.  Then  there 
exists  an  infinitely  long  syntactically  valid  sequence  /y=<<c?>t n1>,  <d,mg>  •  •  •  > 
where  for  all  i  mi(Lh.s  (d))=Tn(lh.s  (d)).  (If  no  two  transformations  on  partial 


2  -  20 


dependencies  introduce  the  same  symbol  into  the  tableau,  each  of  these 
transformations  has  an  effect.)  If  Ap  is  inconsistent,  the  existence  of  x  shows 
that  its  inconsistency  may  not  have  been  demonstrated  after  only  finitely  many 
steps  of  the  chase.  The  existence  of  x  allows  for  infinite  delay .  This  infinite  delay 
is  useless,  in  the  sense  that  no  transformation  requires  infinitely  many  transfor¬ 
mations  to  enable  it.  This  is  simple  to  prove;  however,  the  proof  requires  the 
theory  of  ordinal  numbers,  for  which  see  [Suppes]  or  any  set  theory  text. 

Let  x  be  a  transformation  sequence  and  a  an  ordinal.  The  a-length  prefix  x« 
of  x  is  defined  by 

•  xo  is  the  empty  sequence; 

•  if  a  is  not  a  limit  ordinal,  then  a  is  the  successor  of  @  and  Xa“<X$»  T>  where 

Xa  is  a  prefix  of  x- 

•  if  a  is  a  limit  ordinal,  then  x«=  U  Xp- 

P<a 

Lemma  2.4.  Let  t  be  a  transformation  appearing  in  a  sequence  x  valid  for 
some  tableau  T.  Then  there  exists  a  finite  sequence  £  such  that  r  is  enabled  in 
<(?)■ 

Proof.  Assume  this  is  not  the  case.  Let  @  be  the  least  ordinal,  such 

that  there  exists  some  transformation  rj  such  that  for  some  syntactically  valid  x 
for  T,  tj  is  enabled  in  Xp(T),  but  for  all  valid  f  for  T,  rj  does  not  appear  In  (T) 
for  all  fc  <cj. 

Let  r)=<d,m  >.  We  have  m(lhs(d  ))Qvo(T ).  Therefore,  for  each  row  of 
m(lhs  (d)),  the  transformation  introducing  it  into  xp(T )  is  enabled  in  Xo(^)  f°r 
ci<(3.  Therefore  this  transformation  appears  in  some  finite  length  transforma- 
tion  sequence.  Since  m(lhs  (d))  is  finite,  we  may  construct  a  finite  sequence 
from  the  finite  number  of  finite  sequences  we  have  discovered,  which  enables  rj. 
This  contradicts  our  choice  of  ■ 


2  -  21 


Worn  lemma  2.4,  we  know  that  tf  there  exists  s  sequence  reducing  Tfi  to  ^ 

then  there  exists  a  finite  sequence  with  that  effect.  We  wish  to  show  that  if  there 
is  no  finite  sequence  reducing  Tp  to  0,  then  Ap  is  consistent.  We  do  this  as  in 
theorem  2.2,  by  using  a  chase  sequence  to  construct  a  standard  model  for  A p. 

Suppose  a  transformation  t  is  enabled  in  Xk  for  k  an  integer.  If  the  fc  +  lrt 
transformation  af  y,  qwr,  than  unless  x»*t (T)»+,  there  axieta,  by  Wmwaa  2.3, 
transformations  r)\  r’  such  that  r‘(rf(xk(^')))9V,(r{Xk(^)))-  When  there  are  partial 
dependencies  involved,  we  must  weaken  this  condition  to  equivalence,  where  T  is 
equivalent  to  7”  if  7”  comes  from  a  1-to-l  renaming  of  the  ndv*s  of  T.  For  con¬ 
venience.  we  continue  to  use  the  symbol  **w"  for  tableaux  related  in  this  way. 

This  condition  persists:  for  ail  integers  t  if  f>.  then  we  may  find 

t",  <*  such  that  t”(<,(x*(7,)))  =  <‘’(t(x*(7’))).  We  say  r,  r“t  r"  are  gquivalent 
transformations.  A  transformation  sequence  is  said  to  have  finite  delay  if  for 
every  integer  k,  for  each  transformation  t  enabled  in  Xk(T),  there  exists  an 
integer  l  such  that  either  Xk+i(T)=<fi  or  a  transformation  equivalent  to  t  appears 
in  Xk+i ■  We  say  that  a  tableau  T  is  homomorphic  ally  embeddable  in  a  tableau  T' 
if  there  exists  a  function  h  on  tableau  symbols  such  that  h{c)-c  for  every  con¬ 
stant  (domain  element)  c,  and  h(T)QT\ 

Lemma  2.5.  If  f  is  any  finite  sequence  valid  for  a  tableau  T  and  x  *s  a 
sequence  having  finite  delay,  then  £(T)  is  homomorphically  embeddable  in  x(F). 

Proof.  By  induction  on  the  length  of  Then  basis  is  the  empty  sequence 
(length  =  0).  In  this  case,  every  row  of  T  is  taken  to  the  row  of  x(F)  to  which  it 
corresponds. 

For  the  induction,  suppose  £=<£,  r>.  Then  there  exists  an  homomorphic 
embedding  h  )  by  the  induction  hypothesis.  Let  r=<d,m>.  Then 

h  (m  (ihs  {d  )))<Zx(T)  so  <d,h0m>  is  enabled  in  x(F).  If  d  is  total,  then  A  is  a 
homorophic  embedding  of  (( T )  into  x(F)  since  by  the  finite  delay  property  some 


2  -  22 


transformation  of  x  sets  h  (m(x))=h{m(y))  where  rhs  (d)^”x or  generates 
h(rn(rhs  (d)))  where  d  is  an  egd.  If  d  is  partial,  then  let  h '  agree  with  h  on  all 
symbols  of  f(T’)  and  set  /i'(m(i))=n(i)  for  symbols  x  of  rhs(d)  not  in  lhs(d) 
where  <d,n>  is  the  transformation  equivalent  to  <d,h0m>  which  appears  in  x  by 
finite  delay.  ■ 

Lemma  2.6.  Suppose  no  finite  transformation  sequence  reduces  Tp  to  tp. 
Let  x  be  a  sequence  having  finite  delay.  Then  x(Tp)  is  a  weak  instance  for  p. 

Proof.  As  in  the  if  portion  of  theorem  2.2.  If  xiTp)  did  not  satisfy  the  depen¬ 
dencies  in  Ap%  some  transformation  <d,m>  would  modify  x(Tp).  By  lemma  2.4, 
<d,m>  appears  in  some  finite  sequence  f  and  as  £(T)  is  not  empty  and  is 
embeddable  in  x(Tp),  <d,m>  would  not  modify  x(Tp)-  ■ 

Theorem  2.4.  The  set  of  database  states  which  do  not  satisfy  a  given  set  of 
dependencies  is  recursively  enumerable. 

Proof.  We  may  construct  a  chase  process  which  generates  only  finite  delay 
transformation  sequences.  This  may  be  done,  for  example,  by  imposing  a  total 
order  on  the  set  of  all  transformations  and  applying  a  given  transformation  only 
if  all  lesser  transformations  have  been  attempted.  Details  are  omitted.  ® 

Theorem  2.4  is  also  readily  proven  from  the  logical  point  of  view.  The 
theory  A p  is  inconsistent  if  and  only  if  it  has  consequences  a  and  —a  for  some 
formula  a.  By  the  completeness  of  the  inference  rules  of  predicate  calculus,  the 
set  of  consequences  of  is  recursively  enumerable.  Therefore  the  consistency 
of  Ap  is  partially  decidable  and  theorem  2 A  follows  immediately. 

Testing  satisfaction  may  be  thought  of  as  a  special  case  of  the  implication 
problem  for  data  dependencies.  This  problem  is:  Given  a  set  D  of  dependencies 
and  a  dependency  d,  is  it  the  case  that  every  relation  instance  satisfying  D 
satisfies  d.  This  problem  has  been  open  for  some  time  [CKM]  [F4]  [BVl]  [BV2], 


2  -  23 


For  a  state  p,  consider  the  structure  d  =  <7’p,ct=cj->  where  cit  Cj  are  distinct 
constants  of  Tp.  d  is  (isomorphic  to)  a  data  dependency,  in  particular,  an  egd.  By 
proposition  2.2,  p  is  consistent  if  and  only  if  d  is  not  implied  by  the  set  D,  the 
dependencies  of  A  p.  The  decidability  of  satisfaction  implies  and  is  implied  by  the 
decidability  of  the  implication  problem  for  egd’s  from  a  set  of  dependencies. 
This  subcase  is  also  open. 

There  are  situations  in  which  satisfaction  is  trivially  decidable.  We  can  not 
derive  a  contradiction  from  Ap,  equivalently,  we  can  not  reduce  Tp  to  0,  unless 
there  is  an  equality  generating  dependency  in  Ap.  The  following  result  is  there¬ 
fore  immediate. 

Lemma  2.7.  If  Ap  contains  only  tuple  generating  dependencies,  then  it  is 
consistent.  ■ 

2.4.  Completeness 

A  theory  is  normally  said  to  be  complete  whenever  for  any  sentence  <r, 
either  cr  or  — cr  is  a  consequence  of  the  theory.  Ail  models  of  a  complete  theory 
are  elementarily  equivalent;  that  is,  each  is  a  model  of  the  same  set  of  sen¬ 
tences.  It  is  easy  to  show  that  whenever  p  is  finite  and  satisfying.  Ap  is  not  com¬ 
plete  in  this  sense. 

Since  p  is  finite,  we  may  construct  a  tuple  t  over  U  from  constants  not 
appearing  in  any  p(Pi).  Let  M  be  a  standard  model  for  A p.  If  P(t)  is  true  in  M, 
i.e.,  the  tuple  t  appears  in  the  interpretation  P  of  M,  then  we  may  form  XP  as  the 
image  of  M  under  an  isomorphism  which  renames  the  constants  of  t  and 
preserves  all  other  constants.  If  P  (t )  is  false  in  Si,  vre  may  add  the  tuple  i  to  the 
interpretation  P  to  produce  M\  M  and  A/’  are  non-elementarily  equivalent 
models  of  A p,  since  in  particular  P  (l)  is  true  in  one  but  not  the  ether.  Therefore 
Ap  is  not  complete. 


2  -  24 


This  notion  of  completeness  is  not  necessarily  the  proper  one  for  the  study 
of  databases.  We  can  find  a  more  interesting  notion  by  considering  how  our 
definition  of  satisfaction  differs  from  the  old  one,  which  from  now  on  we  will  call 
strong  satisfaction . 

We  call  the  situation  in  which  R={C/j  the  one  relation  case.  In  the  one  rela¬ 
tion  case,  a  state  p  is  strongly  satisfying  precisely  when  p(U)  is  itself  a  model  of 
Ap.  If  the  dependencies  in  Ap  are  all  equality  generating,  the  same  is  true  when  p 
is  satisfying.  However,  if  some  of  the  dependencies  are  tuple  generating,  the  two 
definitions  diverge.  In  fact,  if  only  tgd’s  are  present,  by  lemma  2.7,  all  p  are 
satisfying. 

The  difference  between  satisfaction  and  strong  satisfaction  has  an  intuitive 
explication.  Satisfaction  does  not  require  every  fact  which  is  implied  by  what  is 
known  to  be  recorded  in  the  database.  Strong  satisfaction  does.  This  difference 
can  be  formalized. 

For  a  satisfying  database  state  p,  the  completion  p*  of  p  is  given  by 

p*(Ri)=  \rRi{w )  i  w  is  a  weak  instance  for  p] 

Definition.  A  state  p  is  said  to  be  tuple-complete  or  just  complete  if  p=p°. 

It  now  becomes  clear  that  strong  satisfaction  is  the  conjunction  of  two 
ideas. 

Proposition  2.3.  In  the  one  relation  case,  if  a  state  p  is  strongly  satisfying, 
then  it.  is  satisfying  and  complete. 

Proof.  Let  W (p)  be  the  set  of  all  weak  instances  forp.  Suppose  p  is  strongly 
satisfying.  Then  it  is  satisfying.  Further,  p(U)Qyj  for  every  vu  CrV (p)  by  the 
definitionofweakinstar.ee.  But  p(U)eW{p).  So  p(Lr)2  W (p)2p(L7)  and  p  is  com¬ 
plete.  ■ 


The  converse  of  proposition  2.3  also  holds  when  the  dependencies  are  all 


2  -  25 


total.  In  this  case  W(p)  is  closed  under  intersection:  If  two  elements  of  W{p) 
share  a  set  of  rows  instantiating  the  left  hand  side  of  some  dependency,  they 
share  the  uniquely  defined  set  of  rows  generated  by  the  right  hand  side.  Thus  if  p 
is  complete,  p{U)c\Y (p).  If  some  dependencies  are  partial,  the  converse  of  pro¬ 
position  2.3  may  fail. 

Example  2.4.  Consider  the  partial  dependency  d  given  by 

lhs(d)=  cij&jCidi 
cz  i  b  gc gd  g 


rhs(d)=  a^biCodj 

Let  p  be  any  isomorphic  image  of  lhs(d).  Elements  of  IV (p)  are  free  to 
instantiate  d3  in  any  way.  Thus  p  is  satisfying  and  complete  but  not  strongly 
satisfying.  ■ 


2.5.  Does  the  Universal  Relation  Scheme  Assumption  Hold? 

We  end  this  chapter  with  some  comments  on  the  validity  of  the  universal 
relation  scheme  assumption. 

Consider  the  problem  of  storing  the  following  relationships  among  persons, 

cars  and  addresses: 

person  p  owns  car  c 

person  p  lives  at  address  a 

car  c  is  kept  at  address  a 


The  universal  predicate,  if  it  has  a  meaning,  is  the  conjunction  of  these 
predicates.  Every  tuple  of  a  relation  for  the  universal  predicate  will  assert  that 
an  address  is  simultaneously  the  residence  of  some  person  and  the  garage  of 
some  car.  In  short,  people  must  keep  their  cars  at  their  homes  and  nowhere 


else.  This  is  a  constraint  on  the  real  world.  Does  it  hold  or  does  it  not? 


2  -  20 


The  theory  of  databases  is  incapable  of  answering  this  question.  The  theor¬ 
ist  must  turn  to  the  user  for  any  fact  about  the  user’s  world  view.  It  is  the  height 
of  arrogance  to  limit  the  user’s  response  a  priori.  To  be  useful,  the  theory  must 
contend  with  whatever  analysis  of  the  real  world  the  user  has  made.  It  is  the 
duty  of  the  theorist  to  inform  the  user  of  some  potentially  undesirable  proper¬ 
ties  of  that  analysis.  The  theorist  has  no  right  to  replace  that  analysis  with  one 
of  his  own. 

Returning  to  our  example,  bow  should  we  respond  to  tbs  ueer  who  madams 
that  this  constraint  does  not  describe  his  conception  of  persons,  cars  and 

t 

i  •  •  .  * 

addresses?  Suppose  that,  to  represent  these  predicates,  the  user  has  chosen  the 
schemes  PC,  PA  and  CA.  In  the  absence  of  any  dependencies,  this  eoaitratet  on 
where  people  msy  keep  their  cart  ie  not  a  consequence  of  A ^  M  wm  hove 
<p,c>€p(PC),  <p.a>ep(PA),  <c.a’>€p(CA ),  we  may  not  ccmclud®  that  the 
PA -predicate  holds  for  pa'  nor  that  CA  holds  for  ca,  unless  those  tuples  appear 
in  p.  Given  that  there  are  no  dependencies,  the  completion  of  p  is  just  p.  It  is 
the  tuples  of  the  completion  which  assert  precisely  those  ground  instances 
which  must  be  true,  given  the  truth  of  p.  Although  each  weak  instance  for  p 
represents  the  constraint  on  where  people  may  keep  their  cars,  since  we  aver 
the  set  of  weak  instances,  rather  than  any  one  weak  instance,  to  be  the  informa¬ 
tion  stored  by  p,  this  constraint  does  not  hold.  Our  approach  is  partially  success¬ 
ful  in  supporting  the  user  who  wishes  to  deny  the  constraint. 

Suppose,  however,  that  the  user  wishes  to  store  only  principal  residences 
and  primary  garages  for  any  person  or  car.  In  short,  she  wishes  the  functional 
dependencies  P-+A,  C  -*A  to  be  enforced.  Given  the  tuples  <p.  c  >,  <p,a>,  <c,a’> 
as  above,  we  must  have  a. -a'.  People  must  keep  their  cars  at  home.  "We  have 
constrained  the  database  in  a  way  the  user  may  not  wish. 

The  standard  solution  to  this  problem  is  to  rename  attributes  [B]  [FMU]  so 


2  -  27 


that  address  is  divided  into  Home-address  and  Garage-address.  We  note  that 
renaming  can  not  be  accomplished  in  general  in  a  purely  syntactic  way.  If 
instead  of  the  functional  dependencies,  we  are  given  the  join  dependency 
*[PC,  PA,  CA  ],  the  database  will  again  require  people  to  keep  their  cars  every¬ 
where  they  live  and  live  everywhere  they  keep  their  cars.  In  this  join  depen¬ 
dency,  the  attributes  are  indistinguishable  syntactically  and  there  is  no 
justification  for  splitting  address  rather  than  person  or  car.  More  significantly,  a 
formal  method  of  attribute  splitting  is  the  replacement  of  the  user’s  analysis 
with  the  theorist’s  and  is  therefore  illegitimate.  Of  course,  the  user  may  wish  to 
rename,  when  confronted  with  the  difficulties  of  not  renaming.  Attribute  renam¬ 
ing.  by  virtue  solely  of  its  effect  of  increasing  the  name  space,  makes  other 
activities,  such  as  query  formulation,  mere  difficult.  Can  the  user  avoid 
unwanted  constraints  without  renaming? 

We  have  insisted  that  the  meaning  of  the  universal  predicate  be  the  con¬ 
junction  of  the  predicates  for  the  schemes  in  the  database  schema.  Suppose 
that  instead,  we  allow  it  to  be  a  disjunction  of  conjunctions  of  such  predicates.  In 
our  example,  suppose  that  a  tuple  <p,c,a>  of  the  universal  relation  is  inter¬ 
preted  as 

person  p  owns  car  c  and  lives  at  a  or 

person  p  owns  car  c  which  is  kept  at  a 

For  a  database  on  [PC,  CA,  PA j.  the  ambiguity  of  interpreting  a  tuple  of  PCA 
may  not  arise.  The  set  of  tuples  of  the  first  conjunct  form  a  weak  instance  for 
[p(PC ),  p(PA)]]  similarly  for  the  second  conjunct  and  ip(PC),  p(CA)|.  The  sets 
[PC,  PA]  and  [PA,  CM}  are  maximal  subsets  of  [PC,  PA,  C.4)  within  which  the 
universal  relation  scheme  assumption  holds.  The  theory  presented  in  this  thesis 
can  be  applied  to  each  set  independently. 


2  -  28 


The  consequences  of  such  a  weakening  of  the  universal  relation  scheme 
assumption  remain  to  be  explored.  A  language  such  as  Lr  must  be  evolved  and  a 
means  of  specifying  the  dependencies  developed.  It  may  be  that  some  dependen¬ 
cies  will  span  conjuncts  of  a  disjunctive  universal  predicate.  How  such  dependen¬ 
cies  can  be  stated  and  tested  in  a  given  database  is  uncertain. 

The  approach  of  "maximal  objects"  as  put  forward  by  Maier  and  Ullman 
[MU]  may  be  expressible  in  terms  of  disjunctive  universal  predicates.  This  would 
add  emphasis  to  research  in  the  area. 


Chapter  3 

Strong  Expression  Equivalence 


3.1.  Introduction. 

In  this  chapter  we  will  extend  the  work  of  Aho  et  al.  [ASUj  concerning  the 
equivalence  of  relational  algebra  expressions.  They  discussed  two  different 
notions  of  equivalence.  Two  relational  algebra  expressions  E \,  E z  arc  said  to  be 
weakly  equivalent  if  E\{p)-Ez{p)  for  any  join-consistent  database  state  p. 
Requiring  equality  over  all  states,  rather  than  just  the  join-consistent  ones,  gives 
a  stricter  notion  called  strong  equivalence.  For  example,  tt^b(AB  *  BC)  is  weakly 
equivalent  to  AB,  but  not  strongly  equivalent.  As  pointed  out  in  [ASU],  when  con¬ 
cerned  with  weak  equivalence  we  can  take  the  convenient  approach  of  viewing 
expressions  as  defined  on  a  single  universal  relation  rather  than  on  several  rela¬ 
tion  variables. 

Aho  et  al.  give  a  means  of  representing  relational  algebra  expressions  as 
tableaux.  Following  [CM],  they  prove  that  two  expressions  are  weakly  equivalent 
if  and  only  if  there  exist,  a  pair  of  containment  mappings  between  their  tableaux. 
For  strong  equivalence,  a  similar  resuit  holds  provided  that  a  modified  form  of 
tableaux  called  tagged  tableaux  is  used. 

When  a  set  of  dependencies  is  given,  it  may  be  used  as  the  basis  of  a  chase 
procedure  on  the  tableau  for  the  expression.  If  we  wish  to  test  whether  two 
expressions  are  weakly  equivalent  over  the  set  of  universal  relations  satisfying 
some  set  of  dependencies,  it  sufTices  to  chase  their  tableaux  and  test  for  the 
existence  of  containment  mappings  [ASU].  Thus  the  semantic  question  of 
equivalence  of  two  expressions  with  respect  to  a  set  of  dependencies  is 
transformed  to  the  syntactically  answerable  question  of  unrestricted  weak 
equivalence. 


3  -  2 


When  strong  equivalence  is  of  interest,  Aho  et  al.  give  no  clear  meaning  to 
the  chase  of  the  tagged  tableau  of  an  expression.  The  reason  is  that  they  do  not 
provide  an  explicit  notion  of  a  database  state  satisfying  a  set  of  dependencies. 
Having  provided  such  a  notion,  we  may  now  provide  a  justification  for  performing 
the  chase  in  a  tagged  tableaux.  Thus  the  question  of  whether  two  expressions 
agree  on  the  class  of  satisfying  database  states  can  be  answered  by  checking  for 
the  existence  of  containment  mappings  between  the  chases  of  their  tagged 
tableaux. 

3.1.1.  An  Example. 

Consider  the  schema  [AB.BC.AC]  and  the  following  expressions  over  it. 

£i=(tT/1  5 (AC  *BC))*AB 
Ez=((nA(AB  *AC))’  (nB(AD  *  BC)))*AB 
The  tagged  tableaux  for  these  expressions  is  given  by 


ADC 

Tag 

a  a 

summary 

a  b  t 

AC 

a  b  i 

BC 

a  a 

AB 

Te 

z 

ABC 

Tag 

a  a 

summary 

a  6 1 

AB 

a  b2 

AC 

b  3  a 

AB 

a  64 

BC 

a  a 

AB 

Ez  returns  the  subset  of  AB  in  which  the  A  -  value  appears  in  AC  and  the  B  -value 
appears  in  BC.  E  t  returns  the  subset  of  that  subset  in  which  the  values  appear 
with  the  same  C -value.  Thus  Ez  contains  E Consider  requiring  states  of  this 
schema  to  satisfy  [A->C,  B-*C],  Aho  et  al.  suggest  that  a  functional  dependency 


3-3 


may  be  applied  to  two  rows  both  of  which  embed  the  dependency.  Nothing  can 
be  done  to  either  of  these  tableaux  in  this  way.  We  will  show  that  these  expres¬ 
sions  are  equivalent  over  the  set  of  database  states  satisfying  [A  -*C,  B  -»CJ  as  we 
have  defined  that  set. 

3.2.  Tagged  Tableau,  valuations 

We  now  introduce  a  variant  on  the  tableaux  of  chapter  2  and  show  how  they 
can  be  used  to  represent  relational  algebra  expressions.  A  tagged  tableau  con¬ 
sists  of  a  body  and  a  summary  row.  The  body  is  a  relation  over  U'=U  [j\Tag] 
where  U  is  the  universe  of  attributes  for  some  schema  R.  The  tableau  domain  of 
A  €.U,  tdom{A ),  is  the  disjoint  union  of  do7n(i4 ),  the  set  faj,  where  a  is  called  the 
distinguished  variable  (dv)  for  A,  and  a  countable  set  Ndv  {A )  of  non- 
distiguished  variables  (ndv’sj  for  A.  Again  tdom(A)  C^tdom{B  )  =  0.  The  tag 
domain  is  tdom(Tag  )  =  R^j[Uj.  The  elements  of  tdom  (A  )  for  A  £U  are  ordered  by 
a  partied  order  <  such  that 

•  all  elements  of  dom  {A  )  are  pairwise  incomparable; 

•  c  <v  for  c€dom(A  ),  v  £  tdom  (A  )-dom(A  ); 

•  a  <6  for  a  the  distinguished  symbol,  b€Ndv(A); 

•  Ndv  (A  )  is  an  equivalence  class  under  <. 

The  elements  of  tdom(Tag)  are  pairwise  incomparable  under  <. 

The  summary  row  of  a  tableau  is  a  row  over  a  subset  of  U  called  the  target 
relation  scheme.  Where  it  is  defined,  the  summary  row  may  contain  only  dis¬ 
tinguished  variables  and  constants  which  appear  in  the  body  of  the  tableau.  A 
tableau  represents  a  mapping  from  states  or  tableaux  to  relations  over  the  tar¬ 
get  relation  scheme.  The  value  of  this  mapping  is  determined  by  the  set  of 
valuation  functions,  which  we  now  define. 


3-4 


Let  v  be  a  function  which  for  each  AeU'  takes  tdom(A )  to  itself.  Further,  we 
require  that  v  respect  <;  that  is,  forget/',  c  €.tdom(A),  i/(c)£c.  (Thus  v  is  the 
identity  on  tdom(Tag).)  As  before,  let  17  be  the  set  and  tuple-wise  extension  of  v, 

ie.,  77(t)=<i/(t  [A  j ]),  ....  u{t  [An])>  andl/^fj,  ....  ^))=r»7(^i) . 17(^)|.  Now  we 

may  define  ( T ,  T‘  are  tableaux,  p  a  state) 

•  v  is  a  valuation  from  T  to  p  if  ~v(r  )[r  [Tag  ]]€p(r  [Fag  ])  for  every  r€.T.  (Thus 
if  T  contains  a  row  whose  tag  is  not  an  element  of  R,  no  such  valuations 
exist.) 

•  v  is  a  valuation  from  T  to  7”  if  ~u(T)CT\ 

We  can  alleviate  the  confusion  of  having  two  kinds  of  valuation  function  by  using 
the  tagged  tableau  for  a  state  in  place  of  the  state  itself.  For  a  state  p  the 
tagged  tableau  for  p  will  be  denoted  Tp  and  is  defined  as  follows: 

For  every  relation  p(/?J  and  every  tuple  t  ep{Ri)  there  is  a  row  u  of  Tp  with 

•  u  ]  =  t 

•  u[B  ]efvdv  ( B  )  appearing  nowhere  else  in  T p  for  B  cU-R^. 

•  u[Tag  ]-Ri 

The  summary  row  of  Tp  is  everywhere  undefined;  i.e.,  the  target  relation  scheme 
is  empty.  We  write  u:T -*ar  if  u  is  a  valuation  of  T  into  a.  It  is  obvious  that  u  is  a 
valuation  v\T -*p  iff  u:T-*Tp.  The  value  of  a  tableau  T  on  a  state  or  tableau  cr  is 

T  ( or)  =  fv(s  )  s  is  the  summary  row  for  T  and 
v:T -*ar  is  a  valuation  function} 

For  nctational  convenience,  we  allow  the  empty  tableau,  denoted  0,  whose  body 
is  the  empty  set.  Evaluation  of  the  empty  tableau  is  the  special  case. 

0(cr)  =  0  for  any  cr 

Fact  /.For  every  expression  E  over  select,  project  and  join  there  is  a 
tableau  TE  such  that  for  every  state  pfTE(p)=E  (p). 


3  -  5 


Proof,  see  [ASU]  Theorem  1. 

3.3.  Containment 

An  expression  E\  is  said  to  contain  an  expression  E%,  written  E^Ez,  if  for 
every  state  p,  E  i(p)^Ez(p)-  Similarly  for  tableaux  T^TZ  if  for  every  ar  (state  or 
tableau)  T  i(o)^Tz(o).  Equivalence  is  containment  in  both  directions;  i.e.,  E\=Ez 
(Ti=Tq)  abbreviates  E  {2 E%  and  Ez^E  \  (T  pT 2  and  Tz^T\). 

A  valuation  v.T \-*Tz  is  a  containment  mapping  ifT7(s)  (s  the  summary  of  T j) 
is  the  summary  of  T g. 

Fact  2.  T\2TZ  iff  there  exists  v.T^Tz,  v  a  containment  mapping.  Thus 
E i^E 2  iff  Te ,27V  iff  there  exists  tj:Te .~*Te V  a  containment  mapping. 

lb  lb 

Proof.  [ASU]  Theorem  2. 

3.4.  The  chase  of  a  tagged  tableau 

The  chase  of  a  tagged  tableaux  is  much  the  same  as  the  chase  of  am 
untagged  tableaux  as  presented  in  chapter  2.  There  are  some  technical  changes. 

1)  The  dependencies  are  written  as  tagged  tableaux.  Each  row  of  each 
tableaux  of  a  dependency  has  tag  U. 

2)  A  transformation  <d,m>  is  enabled  in  a  tableau  T  iff 

-u(m(lhs(d)))Q-nu(T) 

i.e.,  tags  are  ignored  when  testing  a  transformation. 

3)  For  egd’s,  the  partial  order  «  which  refines  <  by  well  ordering  Ndv(A)  for 
each  A  £{J  is  used  to  apply  egd’s. 

4)  For  tgd’s.  The  rows  added  to  a  tableau  by  a  tgd  will  be  U.  In  other  words,  for 
d  a  tgd,  the  added  rows  are,  as  before 


m  ( rhs(d )) 


3  -  6 


3.5.  Results 

The  key  result  is  presented  by  the  first  two  lemmas  and  their  corollary.  The 
essence  of  this  result  is  that  the  picture  of  figure  1  is  a  commutative  diagram 
with  certain  properties.  The  meanings  of  the  function  symbols  in  the  diagram 
are  given  in  the  course  of  the  proofs. 

Let  T,  T’  be  tableaux  and  v  a  valuation  For  x  a  syntactically  valid 

sequence  for  T,  define  the  sequence  \v  as  follows:  if  the  ith  transformation  of  x  is 
<d,m>,  then  the  ith  transformation  of  Xv  is  <d,  uQm>.  The  transformations  of  Xv 
are  formed  by  composing  v  with  the  mappings  in  x- 

Lemma  3.1.  With  T,  T\  u.  x>  Xv  as  above,  Xv  is  syntactically  valid  for  T\ 
Further  w  is  a  valuation  v'x(T)-*Xv(T  ')  provided  Xv(T')*<f). 

Proof.  Suppose  <d,m>  is  the  first  transformation  in  xi  that  is, 

TTu(m(lhs(d)))QTru(T).  Then  nu(v0m  (Ihs  (d  )))CtT[;(7’'),  so  < at,  v0m>  is  enabled  in 

« 

7”.  We  now  show  that  i/:<d,m.>(T)-*<d,  v0m>(T')  is  a  valuation  function. 

Suppose  d  is  an  egd  with  rhs  (d  )=”xi=iJ  ".  Assuming  <d,m>(71)^0,  then  the 
symbols  771(2:*),  m(x;)  become  identified  as  do  v{m{xi)),  i/(m(x;))  if 
<d,  v0m>(T')*<t).  So  v.<d,m>(T)^<d,vam>{T').  Tags  are  not  disturbed  by 
egd’s.  If  < d, m >{T )  —  0  then  every  function  respecting  <  is  trivially  a  valuation. 

If  on  the  other  hand  d  is  a  tgd,  then  <d,m>(T)  =  T  \JS;  <d,  v0m>(T’)=T'{jS’ 
where 

S'  =m(rhs  (d)) 

S'  =  v0m  ( rhs(d )) 

So  i<(S  )-S '  and  u:<d,m>(T)-*<d,  v0m>(T')  is  a  valuation  function,  as  claimed. 
The  above  reasoning  is  extended  by  induction  to  prove  the  lemma.  ■ 

Lemma  3.2.  Let  p  be  a  satisfying  state.  Let  T  be  a  tableau  with  summary 
row  s;  let  s'  be  the  summary  row  of  T*.  (s*  may  differ  from  s  if  a  dv  is  "pro¬ 
moted"  to  a  constant.)  Let  N  be  the  set  of  all  valuation  functions  from  T  to  Tp. 


FIGURE  1. 


3-7 


Let  N *  be  the  set  of  all  valuation  functions  from  T*  to  Tp.  Then  for  every 
there  is  an  tj€N*  such  that  l/(s)=7j(s*).  The  converse  holds  as  well,  provided 

Proof.  Let  i /€JV.  Let  \  be  a  transformation  sequence  such  that  We 

know  v  is  a  valuation  v\T*->Xv(T  p).  However,  it  is  not  necessarily  the  case  that 
XV(T p)  is  Tp.  Let  t  be  the  mapping  which  takes  every  row  of  Xv(Tp)  to  the  row  of 
Tp  to  which  it  corresponds.  We  claim  that  the  mapping  tj  induced  by  the  equal¬ 
ity:  for  every  teT *,  rj{t  )=(oT/(t );  is  a  valuation  function,  t]:T*->Tp  such  that 
r[(s*)=V(s). 

First,  we  see  that  77  is  a  function  since  for  any  t.  t'cT *,  f  [A  ]=£ ’[j4  ]  implies 
T(t  )[.4  ]=“(£  ’)[A  ]  implies  £oi7(£  )[A  ]  =  £o~(f  ')[^  ]•  Secondly,  77  respects  <  since 

by  the  rule  for  applying  egd's.  Thirdly,  the  equalities 
t  [Tag  ]=77(<  )[Tag  ]=£0i7(£)[ Tag  ]  shows  that  77  is  tag  preserving.  Finally  Tj(T*)QTp 
by  definition.  Thus  we  have  rjcN*. 

It  remains  to  show  77(5  *)=T/(s ).  For  each  A  in  the  target  relation  scheme, 
the  symbol  s[y4  ]  appears  in  some  row  of  T.  Call  this  row  t.  Now  u{t[A  ])  is  a  con¬ 
stant  and  since  £  preserves  constants,  v{t  [A  ])=£0i7(f  )[>1  ]=77(f  [4  ]) 

To  prove  the  converse,  let  rjcN \  We  define  a  mapping  v  by  describing  its 
behaviour  on  a  row  tcT.  Let  £(f)  be  the  corresponding  row  of  T *.  Such  a  row 
exists  by  The  row  rj0£{t)  certainly  has  r]a^t)[Tag  ]=  t  [Tag  ]€R.  There  is  a 

unique  row  t'€.Tp  such  that  Z(t')=T)0i(t):  t'  is  such  that  t  '[Tag  ]  =  f  [Ta<7  ]  and 
t  '[t  [Tag  ] ]  —  770 £( t  )[f  [Tag  ]].  Define  v  such  that 

£oT(0=^o£(0 

We  must  show  1/  to  be  a  valuation  function.  irT -*T p.  ~v(T)QTp  is  apparent,  v 
is  certainly  the  identity  on  tdom(Tag )  and  respects  <  by  virtue  of  the  fact  that 
for  t  €.T  and  A  €.t[Tag],  v{t[A~\)  is  a  constant  and  for  B  €.U  —  t  [Tag  ]]  both  t  [B  ] 
and  v{i[B\)  are  ndv’s  appearing  nowhere  else  in  their  respective  tableaux.  To 


3-8 


show  that  v  is  functioned,  suppose  t,  t'€.T.  <[A]=r[j4]  implies  $(t)[A  ]=£(<  ')[i4  ] 
implies  rf0((t)[A  ]=rfo^(<  ’)[A  ].  But  f[i4]  =  f’[i4]  also  implies  A  Ct  [Tag  ]p\f  '{Tag  ]. 
Therefore  v(t\A  ])  v(t'\A  ])  are  both  constants.  From  the  chain  of  equalities 

v{t  [A  ])=$.u(0[i4  )IA  • )[A  ')[i4  ]ai f{t  '[A  ]) 

we  have  that  u  is  a  function. 

It  remains  to  show  that  T(s  )=7f(s*).  Note  that  for  any  attribute  A  in  the  tar¬ 
get  relation  scheme,  s  [A  ]  =  £  [A  ]  for  some  row  t  &T  where  Act\_Tag~\.  Thus  u(t  [A  ]) 
is  a  constant  and  )[A  ]=77«,£(f  )[j4  ]  must  be  a  constant  as  well.  But  also 

s*[A  ]=£(£  )[A  ]  so  t](s'[A  ])  =  v(s  [A  ])  as  required.  ■ 

Corollary.  Let  p  be  a  satisfying  state.  For  any  expression  £\  Te(p)=Te (Tp ). 

Proof.  If  Te*$,  this  follows  directly  from  lemma  3.2.  For  7j  =  0  we  must 
show  the  set 

[u  |  v  is  a  valuation  from  Te  to  Tp] 

is  empty. 

Assume  otherwise.  Let  v:Te~*Tp  be  a  valuation  function  and  let  \  be  a 
transformation  sequence  such  that  x(.Te):zTe=<P.  Let  t  be  the  transformation  of 
X  such  that  t(%(Te))=&  for  £  a  prefix  of  x>  If  r=<d,m>  then  d  is  an  egd 

with  rhs (d)="Zi=Xj"  and  m(x<)l  m(xy)  distinct  constants.  From  lemma  3.1, 
<d,  u»m>  is  enabled  in  X„(Tp).  But  since  p  is  a  satisfying  state,  7^*0  by  fact  3. 
Therefore  v0m.(xi),  i/0m(x;)  are  not  distinct  constants.  This  implies  that  u  does 
not  respect  <  and  is  therefore  not  a  valuation  function,  contradicting  lemma  3.1. 
■ 

We  write  E\^c^z  {E  \ -c^z)  to  mean  for  all  p  satisfying  C,  E\(p)^Ez(p) 

(E  x{p)=E  2{p)). 

Lsrrjma  3.3.  E'^cE  implies  Te  ^T^. 

Proof.  Let  y  be  an  injection  from  tableau  symbols  to  constants  which  is  the 


3-9 


identity  on  constants  and  takes  non-constants  to  constants  not  appearing  in 
either  7V  or  Tg.  Let  the  state  pg  be  defined  as  follows:  for  every 

PE(Ri)=\v(r)[fti]  I  reT*E  and  r[  Tap  ]=/?<$ 

PE  is  a  satisfying  state  since  i TuOfi(Tg))  is  a  satisfying  instance. 

Consider  E(jde)-  Let  s*  be  the  summary  of  Tg .  We  claim  Tp(s*)eE (pe)-  The 
rows  of  Te  whose  projections  make  up  pe  are  in  one-to-one  correspondence  with 
the  rows  of  7V  This  correspondence  induces  a  valuation  function  which  takes 
the  summary  of  Te  to  p(s*)  as  required. 

By  the  claim  and  the  hypothesis,  we  have  Ip(s*)eE'(j)E).  Thus  there  is  a 
valuation  v  of  7V  taking  its  summary  to  !p( s *).  Therefore  7i=<p~*0v  is  the  restric¬ 
tion  of  a  containment  mapping  from  7V  to  Te-  Constants  are  preserved  by  77  by 
the  choice  of  p.  ■ 

Theorem.  E  =qE '  iff  Tg^Tg<. 

Proof. 

(If)  Let  p  be  any  satisfying  state.  Te(T*p)=Te'{T*p)  implies  Tg{T  p)-Tg>{T  p )  by 
corollary  1  implies  E  ( p)-E'(p )  by  fact  1. 

(Only  if)  Tg^Tg  (by  lemma  3.3)  implies  there  exists  a  valuation  function 
v.Te'~*Te  which  preserves  summary  rows.  By  lemma  3.1.  v  is  a  valuation  function 
from  Tg<  to  Tg.  If  a  transformation  <d,m>  alters  the  summary  row  of  7V  by 
equating  the  dv  a  to  the  constant  c,  either  i/(a)=c  or  Tg  =  <f>,  by  lemma  3.1.  In  the 
latter  case,  Tg^Tg  vacuously;  otherwise,  v  is  a  containment  mapping  from  Tg- 
to  Tg.  - 

3.6.  The  example  revisited. 

Reconsider  the  example  given  earlier.  The  completion  of  Tgz  under 
\A-+C,  B->C]  is 


3  -  10 


ABC 

Tag 

a  a 

summary 

a  b  i  62 

AB 

a  6  g  6  g 

AC 

b  3  a  b  2 

AB 

b  q  a  62 

BC 

a  a  62 

AB 

There  is  a  containment  mapping  from  7V  to  Tp  .  Thus  E \  contains  E%  ■within  the 

1  2 

set  of  states  satisfying  \A->C,B-*C\.  As  E2  contains  E 1  everywhere,  the  two 
expressions  are  equivalent  within  satisfying  states. 


Chapter  4 

Functions  in  Databases 


4.1.  Introduction 

In  this  and  the  next  chapter  we  will  restrict  ourselves  to  the  study  of  func¬ 
tional  dependencies.  In  this  chapter,  we  discuss  what  it  might  mean  for  a  func¬ 
tional  dependency  to  be  ‘present’  in  a  database  schema.  We  find  two  distinct 
possibilities.  The  appearance  of  a  dependency  in  the  definition  of  a  database 
indicates  that  the  states  of  the  database  are  to  encode  a  function.  A  method 
based  on  the  chase  of  calculating  the  function  encoded  by  a  particular  state  is 
given  and  compared  to  methods  utilizing  derivations  of  the  dependency.  A  test 
for  deciding  whether  the  states  of  a  schema  may  encode  a  non-empty  function  is 
presented  as  is  a  characterization  of  the  class  of  schemas  which  are  capable  of 
encoding  non-empty  functions  for  all  the  dependencies  in  the  definition.  This 
class  is  the  class  of  dependency  preserving  schemas  as  defined  by  Beeri  et 
al.[BMSU]  and  is  strictly  larger  than  the  class  presented  by  Bernstein. 

Alternatively,  we  might  say  that  a  functional  dependency  is  present  in  a 
schema  if  the  dependency  is  capable  of  constraining  the  states  of  the  database; 
that  is,  capable  of  uncovering  input  errors  made  by  the  users.  We  show  that  this 
capability  is  strictly  weaker  than  the  first  objective;  thus,  even  dependencies 
whose  functions  are  everywhere  empty  may  still  act  as  constraints.  Bounds  on 
the  requirements  for  a  dependency  to  act  as  a  constraint  are  derived. 

A  secondary  result  of  this  research  touches  on  the  derivation  of  a  depen¬ 
dency  from  a  set  of  dependencies.  We  give  a  method  for  interpreting  a  deriva¬ 
tion  as  an  expression  in  the  relational  algebra.  We  point  out  that  two  distinct 
derivations  may  result  in  nonequivalent  expressions,  even  when  applied  to  a  sin¬ 
gle  satisfying  instance.  We  also  show  that  the  function  represented  by  a  multi¬ 
relation  database  state  is  not  necessarily  calculated  by  any  of  its  derivations. 


4  -  2 


4.2.  Functional  Dependencies  Within  One  Relation 

We  begin  our  investigation  by  considering  the  behaviour  of  functioned 
dependencies  within  a  relation.  We  first  recall  the  definition  of  functional  depen¬ 
dencies  in  their  role  as  constraints. 

( Satis  faction-1 )  Let  R  be  a  relation  scheme  and  F  a  set  of  functional  depen¬ 
dencies.  If  /  is  an  instance  for  R,  then  we  say  /  satisfies  F  (or  /  is  a  legal 
instance  for  R)  if  the  following  holds:  For  each  dependency  X->Y  in  F,  for 
every  pair  of  tuples  t,  u  in  /,  t  \X~\=-u\X ]  implies  t  [Y ]=ii[F ].  ■ 

Now  let  us  consider  the  functions  described  by  the  dependencies  in  F.  The 
primary  means  whereby  a  relation  instance  relates  attribute  values  is  by  includ¬ 
ing  a  tuple  in  which  the  values  appear.  With  this  in  mind,  assign  each  depen¬ 
dency  in  F  a  unique  label.  Let  f  be  the  label  of  X-*Y.  Now  define1" 

<P/  =  M.  \x.  tty((7X=x(J)) 

where  /  is  an  instance  and  x  is  an  X-value.  Thus  <pf  is  a  function  from  instances 
for  R  to  mappings  from  X-values  to  Y-values.  Clearly  if  /  is  a  satisfying  instance 
then  is  a  function.  For  any  such  satisfying  instance,  /,  let  //  be  the  func¬ 

tion  represented  by  /,  described  by  the  dependency,  /.  We  will  introduce  a 
definition  for  fj  momentarily.  Let  us  first  point  out  the  difficulties  confronting 
such  a  definition.  In  particular,  we  may  not  conclude  that  /;  =  <?/(/)•  X-values  not 
in  7Tj r(/)  may  nonetheless  have  Y-values  assigned  to  them  by  //.  We  demonstrate 
this  by  way  of  an  example. 

Example  4.1.  Let  F  =  \g  :Ar-»  Y,  h:YW -*Z,  f  :XW ->Z  Consider  the  instance 


^The  A-operator  is  the  abstraction  operator  of  the  larr.bda  calculus.  Say  we  have  some  ex¬ 
pression  X 2 .  Now  if  it  is  defined  at  all,  X2  is  some  number  which  we  know  exactly  when  we 
know  x  The  expression  \x.  X  2  is  the  squaring  function,  which  may  be  thought  of  as  a  set  of 
ordered  pairs. 


4-3 


X 

Y 

W 

Z 

*1 

Vi 

21 

*2 

Vi 

w2 

*2 

We  claim  that  //(xiu>2)=z2.  To  justify  the  claim,  note  that  any  tuple  added 
to  /  containing  XW-value  x1w2  must  have  Y-value  y\  (by  9i(x \)=y  \)  and  Z- 
value  z2  (by  hi(y  1^2) =£2)  if  it  is  to  be  satisfying.  Of  course,  (p/(/))(xiU’2)  is 
undefined.  ■ 

We  propose  a  definition  of  the  function  fj  represented  in  a  satisfying 
instance,  I,  for  the  dependency  fiX-*Y. 

(Representation-1 )  The  value  of  //(x)  is  the  unique  value  for  Y,  if  such  a 
value  exists,  which  must  appear  in  the  Y-columns  of  any  tuple  t  having 
t[X]=x  if  the  instance  I  {j\t\  is  to  be  satisfying.  If  no  such  Y-value  exists, 
fi  is  undefined  at  x. 

If  x  appears  in  tt *(/),  then  (^/(/ ))(x)=//(x ).  If,  however,  x^rr^/),  then 
pf(I)  is  undefined  at  x,  but  //  need  not  be.  So  we  have  py(/)C//.  The  remainder 
of  this  section  is  concerned  with  calculating  fj. 

For  an  X-value,  x,  let  the  tuple  tx  be  defined  such  that  tx\X]=x  and  for  any 
attribute  B  not  in  X,  tx[B]  is  a  unique  non-distinguished  symbol.  For  /  an 
instance,  Ix,  the  x-augmentation  of  /,  is  the  untagged  tableau 

If,  within  Ix,  we  replace  the  non-distinguished  symbols  with  constants,  we  get  an 
instance  of  R.  If  we  can  prove  that  there  exists  exactly  one  way  of  placing  con¬ 
stants  in  the  Y-columns  such  that  the  result  is  a  satisfying  instance  of  R,  we  will 
have  calculated  fj(x).  The  chase  procedure  may  be  adapted  to  this  purpose. 
However,  as  we  will  be  considering  only  functional  dependencies,  we  will  adopt  a 
simplified  notation  for  transformations.  In  this  and  the  next  chapter,  a  transfor¬ 
mation  <d,s>  consists  of  a  dependency  d  and  a  pair  of  rows  s.  For  example,  sup¬ 
pose  X-*A  is  a  dependency  and  rows  t.  u  of  a  tableau  agree  on  X  and  disagree  on 


4-4 


A.  Then  the  transformation  <X-*A,  [t,u]>  is  enabled-  The  rows  t,  u  are  said  to  be 
involved  in  the  transformation.  As  usual,  Ix  denotes  the  completion,  under  some 
known  set  of  functional  dependencies,  of  Ix. 

Clearly  for  any  satisfying  instance  J  containing  I  and  a  tuple  with  X-value  x, 
there  exists  a  valuation  function  from  /*  to  J. 

Lemma  4. 1 .  Let  /  be  an  instance  of  R\  let  f:X -*Y  be  a  dependency  and  x  an 
X-value.  Then  //(r)  is  defined  iff  7**0  and  no  non-distinguished  symbol  appears 
in  (^/(/x))(x).  Further 

//(*  )=(«»/ (/;))(*) 

when  fi(x)  is  defined. 

Proof.  Let  t  be  any  tuple  with  f  [Jf]=x.  If  7^=0.  it  is  apparent  that  I  [j\t]  is 
not  a  satisfying  instance.  Thus  no  Y- value  may  be  found  to  satisfy  the  definition, 
and  fj(x)  is  undefined.  Otherwise,  assume  (p/(7^))(x)  contains  no  non- 
distinguished  symbol,  but  t  [F]*(^(7*))(x).  But  then  there  is  no  valuation  func¬ 
tion  from  Tx  to  7lj££j;  therefore,  is  not  satisfying.  Finally  assume 

(.VjUx)){x)  contains  a  non-distinguished  symbol  in  some  column;  say 
(?/(K))(x)[Yi]=b.  Assume  7  ijffj  is  satisfying  and  let  g  be  a  valuation  function 
from  Ix  to  Let  g'  be  defined  such  that  g\a)-g  {a)  for  a  *b  and  <7'(&)=c 

where  c  is  a  Yrvalue  not  appearing  in  Then  g’(Iz)  is  a  satisfying 

instance,  implying  fj(x)  is  undefined.  ■ 

The  crux  of  example  4.1  is  that  f  may  be  derived  from  g  and  h.  Sound  and 
complete  inference  rules  for  deriving  functional  dependencies  from  a  set  of  such 
dependencies  have  been  known  since  the  work  of  Armstrong  [A].  The  closure  of 
a  set,  F,  of  dependencies,  denoted  F+,  is  the  smallest  set  containing  F  which  is 
closed  under  the  inference  rules.  Two  sets  of  dependencies,  F,  C  are  equivalent, 


*In  the  presence  of  domain  constraint g  [F5],  such  an  Y- value  may  not  be  available. 


4-5 


written  F ^C,  if  F+=G+.  G  is  said  to  be  a  cover  of  F  when  F=G.  G  is  a  non- 
redundant  cover  of  F  when  no  proper  subset  of  C  is  a  cover  of  F. 

A  particular  sound  and  complete  set  of  rules  is  given  by  Bernstein  [B].  The 
rules  are  reflexivity,  augmentation  and  pseudo-transitivity. 

{ref l exivity  )  0 1 — X  -+X 
( augmentation )  — XW  -*Y 

{pseudo -transitivity)  [X-+Y,  YW-*Z]\ — XW-*Z 

Bernstein  has  shown  that  any  derivation  of  any  dependency  having  a  single 
attribute  on  the  right  from  a  set  of  such  dependencies  can  be  done  using 
pseudo-transitivity  as  the  sole  inference  rule  followed  by  at  most  one  application 
of  augmentation.  In  [B],  he  presented  a  labelled  graph  construction  which 
models  derivations  using  pseudo-transitivity  as  the  sole  inference  rule.  The 
graphs  are  called  derivation  trees  and  they  are  defined  recursively. 

i)  If  A  is  an  attribute,  a  single  vertex  labelled  A  is  a  derivation  tree. 

ii)  If  T  is  a  derivation  tree,  B  \B 2  •  •  •  Bp-+C  is  a  dependency  and  C  labels  a  leaf 

of  T,  then  the  tree  formed  from  T  by  addmg  p  leaves  labelled  B  ltB z,  •  •  •  .  Bp 

as  descendants  of  C  is  a  derivation  tree. 

iii)  Nothing  else  is  a  derivation  tree. 

A  derivation  tree  built  with  respect  to  a  set  of  dependencies,  F,  the  leaves  of 
which  tree  are  labelled  by  the  set  X  and  the  root  of  which  is  labelled,  A,  is  called 
an  F-based  derivation  tree  of  X-»A.  Such  an  object  need  not  be  unique. 

We  now  present  a  method  for  constructing  a  relational  algebra  expression, 
called  a  derivation  expression,  from  a  derivation  tree.  The  method  proceeds  in 
two  stages.  The  first  stage  produces  an  expression  over  projection  and  a 
modified  form  of  selection  in  which  the  selection  formula  may  involve  expres¬ 
sions.  The  second  stage  transforms  the  expression  into  one  over  selection,  pro¬ 
jection  and  join,  demonstrating  that  the  modified  selection  operator  adds  no 


4  -  6 


new  power  to  the  relational  algebra. 

Stage  1 .  Let  Dt  be  a  derivation  tree.  The  expression  constructed  by  stage  1, 
denoted  -/(Dt),  is  defined  by  recursion  on  the  height  of  Dt  as  follows: 

y(Dt)= 

i)  if  height  {Dt  )  =  0,  then  l,  where  l  is  a  formal  variable  associated  with  the  label 
of  the  single  node  in  Dt ; 

ii)  if  height  {Dt  )>0  and  letting  the  degree  of  the  root  of  Dt  be  m,  then 

KB&x^iDi  JA  ■  ■  ■  AXm=-r(Dtm){I )) 

where  B  is  the  label  of  the  root  of  Dt\  X{  is  the  label  of  the  root  of  Dti ,  the  i111 
subtree  of  Dt.  As  we  are  concerned  with  expressions  computing  functions, 
each  evaluation  of  7 {Dt^)  returns  at  most  one  value  (formally,  at  most  a  sin¬ 
gleton,  unary  relation  instance). 

The  modified  selection  expression  generated  by  part  ii  of  the  definition  is 
meant  to  return  the  subset  of  the  relation  comprised  of  tuples  whose  Xj-value  is 
the  value  returned  by  the  expression  7 {Dti). 

Stage  2.  Let  e  =7 {Dt).  Define  a  function  (5  recursively  on  the  depth  of 
expression  nesting  of  e  (equivalently,  the  height  of  Dt)  as  follows: 

if  the  depth  of  e  is  0,  then  c5(e)  =  e,  which  is  some  formal  variable  L\  other¬ 
wise,  e  is  7t fi(crjr(/))  where  F  may  be  written 

F  \/\  ■  *  ■  /\  F  k  /\  0  \/\  ■  ■  ■  /\  Gi 

for  k.LZ 0,  where  is  a  simple  condition  of  the  form  "attribute  =  formal 
variable"  and  Gj  is  of  the  form  "attribute  =  stage  1  expression".  In  this  case 
<5 ( e )  is  given  by 

*b{vfxA  AFt(<5(ei)*  •  •  •  *5{et)*I)) 

where  Sj  is  the  stage  1  expression  for  Gj.  The  join  portion  of  this  expression 
is  the  subset  of  the  relation  comprised  of  tuples  whose  values  are  given  by 


4  -  7 


the  values  of  the  converted  stage  1  expressions.  Thus,  informally,  6(y(DT)) 
is  as  desired. 

The  stage  two  form  is  less  perspicuous  than  the  stage  one  form.  The  stage 
two  form  uses  the  join  to  simulate  the  selection  presented  more  straightfor¬ 
wardly  by  the  stage  1  form.  Thus  we  will  use  the  stage  1  form  to  define  the 
derivation  expressions.  The  expression  formed  from  a  derivation  tree,  Dt,  for  an 
fd  /  :L !  •  •  •  Lk  is  denoted  ipj  and  is  given  by 

Tjyf  =  \I,  AZ,  •  •  •  lk.(y(Dt )) 

where  \ii  |  ISiSfcj  is  the  set  of  formal  variables  for  the  leaves  of  Dt.  For 
g:Li  •  •  •  LkLk4.i  ■  •  •  Lk+m->A  derived  by  augmentation  from  /,  the  corresponding 
expression  is 

=  Xlr  •  •  lkLk  +  r  ■  ■  lk+m.(y(Dt)) 

This  effectively  ignores  the  values  of  the  attributes  added  by  augmentation. 
Therefore  we  will  feel  free  to  denote  by  Tpf  the  expression  for  the  derivation  of 
any  fd  from  /  by  augmentation. 

A  derivation  expression  will  be  called  trivial  if  the  tree  which  generates  it  is 
trivial;  i.e.,  is  of  height  0.  If  the  label  on  the  only  vertex  of  a  trivial  derivation 
tree  is  A,  the  dependency  derived  is  A  ~>A  (by  reflexivity).  The  expression  for  this 
tree  is 

XI.  \a.  a 

which  for  every  instance  J  is  the  identity  on  dom(A  ).  For  consistency,  we  define, 
where  v  is  anA-value, 

((XI.  X  a.a)(J))(v)=\v  ] 

This  convention  allows  us  to  replace  the  selection  operator  with  a  join.  This  form 
facilitates  any  proof  by  induction  over  !he  complexity  of  the  expression.  Note 
that  some  trivial  dependencies  may  have  non-trivial  expressions. 


4  -  8 


Example  4.2.  Reconsider  example  4.1.  The  derivation  tree  for  f  from  g  and 
h  is 


The  first  stage  expression  for  this  tree  is 

j{Dt  )  =  7T Z (v V  =w  A  )){!)) 

The  output  of  stage  two  is 

6(7(Dt))=‘nz(crw=u'((nr(<7x=z  (/)))*/)) 

Rewritten  without  selection,  this  becomes 

*  ((wy(M  •/))  •/))  ’ 

The  following  property  of  derivation  expressions  is  basic. 

Lemma  4.2.  If  /  is  an  instance  satisfying  a  set  of  functional  dependencies  F 
and  ip  is  a  derivation  expression  for  X -*A  wrt  F,  then  for  all  A'-values  x,  either 
U(I))(x)=$  or  ](V'(/))(x)!=l. 

Proof.  A  simple  induction  on  the  depth  of  nesting  in  ip,  which  is  omitted.  ■ 

If  (Tp(I))(x  )  =  0,  we  say  ip  is  undefined  at  x  in  /.  If  {^p{I)){x)-[a  we  will  write 
(ip(I  ))(x  )=  a.  Lemma  4.2  states  that,  if  ip  is  a  derivation  expression  for  X -+A  built 
with  respect  to  a  set  of  dependencies  F,  then  ip  is  a  mapping  from  instances 
satisfying  F  to  functions  from  ^-values  to  A-values. 

Since  a  dependency  may  be  derived  from  a  set  of  dependencies  in  more 
than  one  way,  as  defined  is  uniquely  determined  only  with  respect  to  a  given 
derivation  tree.  Say  that  there  are  n  distinct  derivation  trees  for  a  given  depen¬ 
dency.1-  Assign  the  integers  1  ■  ■  ■  n  to  these  trees  in  any  way.  Denote  by  ip)  the 
^There  may  be  infinitely  many  distinct  derivation  trees 


4  -  9 


expression  generated  by  the  i01  derivation  tree  for  /.  We  will  show,  by  way  of  an 
example,  that  instances  may  exist  for  which  not  only  but  indeed 

domain  values  exist  for  which  these  functions  return  distinct  results. 

Example  4.3.  Let  F  —  \X-*A,  X->B,  YA-*Z,  YB-*Z\.  Then  Fl — XY-*Z  in  two 
different  ways.  Consider  the  instance,  which  satisfies  F 


X 

A 

B 

Y 

*1 

b  i 

1/8 

*2 

b2 

Vi 

2  2 

X* 

a2 

b  i 

Vi 

23 

The  two  expressions  associated  with  the  two  derivations  of  XY-*Z  are 
=  Xxy.  nz(crY=v AA=-nA(aXmmV))(n)  and 

Tp2=\I.  Xxy.  ‘nz(vr=yAB*ixB{oXmm(l))V))-  The  reader  may  verify  that 
(ipl(J))(x  iy  \)~z2  and  (Tp2(J))(x  \y  \)  =  z2.  Consider  calculating  XY  -*Z  at  x^y  j. 
JzlUl  is  given  by 


X 

A 

B 

Y 

z 

Nbr 

Xl 

aj 

b  i 

y  2 

2  1 

1 

X 2 

a  l 

b2 

y\ 

22 

2 

x3 

“8 

b, 

y\ 

23 

3 

X\ 

y\ 

4 

(the  blanks  representing  non-distinguished  symbols)  and  we  apply  the 
transformation  sequence 

<*  ->A.  fl,  4j> 

<X^B,  fl,4j> 

<YA  -*Z,  1 2,  4j> 

<YB  ->Z[  3,  4|> 

the  last  transformation  being  contradicted.  So  /^  =  0  and  the  function  is 
undefined  at  that  point.  ■ 

In  light  of  example  4.3,  we  define  for  any  functional  dependency  f,  a  map¬ 
ping  Ty  from  instances  to  functions.  For  f:X-*A,  Sfy  is  defined  for  an  instance  / 
and  X- value  x  as 


4  -  10 


(^(/))(x)  =  (u(^(/))(x)) 

i 

where  the  union  is  taken  over  all  derivation  expressions  for  /.  This  definition  is 
sensitive  to  the  particular  set  of  dependencies  with  respect  to  which  the  deriva¬ 
tions  are  carried  out.  A  consequence  of  this  sensitivity  is  pointed  out  in  a  subse¬ 
quent  section.  We  say  that  ^/(/)  is  undefined*  at  a  value  r  where 
|  (4fy(/))(x)  j*l.  As  above,  if  ('f'j(f))(x)=fa  j,  we  will  write  (fy(/))(x)=a. 

Finally,  we  add  dependencies  with  multi-attribute  right  hand  sides.  For 
f:X-*Y  where  Y=Yl  -  •  •  Y k,  let  <kj{  be  the  expression  for  fi:X-*Yi.  Then 

*/(/>  is  defined  exactly  at  those  pomts  where  all  of 

the  (/)  are  defined,  as  the  reader  may  verify.  For  the  most  part,  we  will  res¬ 
trict  ourselves  to  single  attribute  right  hand  sides  for  convenience. 

We  now  prove  a  helpful,  technical  lemma.  For  a  row  v€T,  we  say  that  a  value 
v\A  ]  repeats  in  a  tableau  T  if  there  exists  a  row  w^-T,  u/*v  and  w[A  ]=v[A  ].  A  is 
called  a  repeating  attribute  of  t  in  T. 

Lemma  4.3.  i)  Let  t  be  a  row  of  a  tableau  T.  Let  X  be  the  set  of  repeating 
attributes  of  t  in  T.  Let  x  be  a  transformation  sequence  valid  for  T\  let  £  be  the 
correspondence  function  £'-T-*x(T).  For  every  attribute,  B  such  that  £(£)[£?  ] 
repeats  in  x(T),  the  dependency  X->B  can  be  derived  from  the  set  of  dependen¬ 
cies  appearing  in  the  transformations  of  x- 

ii)  Let  t,  T,  X,  x .  (  be  as  in  part  i.  If  t  is  the  only  row  of  T  modified  by  x.  then 
for  every  B  such  that  £(£)[5]  repeats  in  x(^)<  there  exists  a  derivation  expres¬ 
sion  ip  for  X->B  such  that  {ip(T  ))(t  [A’])=  )[B  ]. 

Hi)  Let  f:X->A,  I  be  a  satisfying  instance  and  suppose  ipj(I)  is  defined  at  x. 

Then  the  derivation  tree  generating  Tp j  also  generates  a  transformation 

fOur  use  of  the  term,  ‘undefined’  to  describe  certain  behaviour  of  the  evaluation  of  expres¬ 
sions  does  not  alter  the  fact  that  ('I' y  (/  ))(x  )  is  well  defined  for  all  I ,  X.  ('I' j(I  ))(x  )  is  in 
every  case  an  instance  of  a  relation  whose  scheme  is  given  by  the  right  hand  side  of  f . 


4  -  11 


sequence  valid  for  Ix  and  setting  tx[A 

Proof,  i)  By  induction  on  the  number  of  transformations  in  the  shortest 
prefix  £  of  x  such  that  t  )[B  ]  repeats  in  %(T).  (£(t )  is  the  row  corresponding  to 
t  i n%(T).) 

If  £  is  empty,  then  B  eX  and  X-+B  is  trivial.  Let  the  last  transformation  of  £ 
be  <W-*B,  \xl,v]>  One  of  u,  v  corresponds  to  t,  by  the  shortness  of  £.  By  induc¬ 
tion,  X-*W  can  be  derived  from  the  dependencies  in  Therefore  X -*B  can  be 
derived  from  the  dependencies  in  \ • 

ii)  By  induction  on  the  length  of  the  shortest  prefix  £  of  x  such  that 
t  )[Z?  ]=£(£  )[B  ].  Q  as  above.)  For  the  basis,  where  £  is  empty,  the  trivial  depen¬ 
dency  has  the  trivial  expression.  For  the  induction,  let  <W-*B,  \t,u]>  be  the  final 
transformation  of  £.  If  W -W  ^  •  •  •  Wk,  then  there  are  expressions  for  l^i^k 
(some  of  which  may  be  trivial)  such  that 

(V'1(r))(([jr])=?(<)[»r(]=u[»r,] 

Therefore 

u€  •  (tdT)Kt[X])T 
1  =  1 

and  the  expression  V'  is  obvious. 

iii)  By  induction  on  the  height  of  the  derivation  tree  generating  1/j.  If  the 

height  is  1,  then  I  contains  a  tuple  ui  with  w[X]=x  and  the  required  transforma¬ 
tion  is  \tM,w  }> 

For  trees  of  height  m  + 1,  let  the  root  dependency  (the  dependency  attach¬ 
ing  the  immediate  descendents  of  the  root  of  the  tree)  be  Y -*A  where 
Y -Y i  ■  •  •  Yk.  Let  ipg.  be  the  subexpression  of  ipf  for  the  subtree  rooted  at  Yj 
(l^j  Sk).  By  Tpj(I)  defined  at  x,  there  is  a  row  v  of  /  with  v  [Yj]=(ipg^(I))(x)  for 

each  1  %j%k.  By  the  induction  hypothesis,  there  is  a  transformation  sequence  £ 
setting  tx[Yj]  —  (Tpg,{I)){x)  for  each  l^jSk.  Thus  the  sequence  <£\  <Y^A,  [tx,v\» 


4  -  12 


i9  as  required.  • 

Now  we  prove  some  facts  relating  derivation  expressions  to  functions  in  the 
setting  of  a  single  relation,  may,  for  a  given  instance,  be  defined  at  more 
values  than  tpj  is  defined.  However,  the  two  functions  agree  wherever  both  are 
defined. 

Proposition  4. 1 .  For  any  satisfying  instance  /,  fd  f  iX  -*Y 

(*,(/))£(*,(/)) 

Furthermore  if  fi{x)  is  defined  then 

/7(x)=(^/(/))(x) 

Proof.  The  proof  of  the  first  claim  is  an  easy  induction  over  the  height  of  an 
arbitrarily  selected  expression  making  up  Details  are  omitted.  The 

second  claim  stems  from  the  observation  that  the  chase  of  Ix,  if  it  does  not 
encounter  a  contradiction,  makes  changes  to  the  tx  tuple  only.  By  lemma  4.3 
part  ii,  if  in  Ix,  f*[C]  is  the  constant  c,  some  derivation  expression  for  X->C 
returns  c.  Conversely,  if  for  some  derivation  expression  for  X->C  returns  a  con¬ 
stant  c,  it  may  be  converted  to  a  transformation  sequence  which  sets  tx[C]  =  c  by 
part  Hi  of  lemma  4.3.  So  fj(x  )  =  c  by  lemma  4.1,  if  fj(x )  is  defined.  • 

This  proposition,  with  lemma  4.1,  states  that  the  value  of  a  function  at  a 
point  is  computed  by  some  derivation  of  that  function.  It  suggests  that  the 
derivations  may  provide  more  information  than  is  actually  present  in  the  rela¬ 
tion,  returning  values  where  the  function  is  undefined.  This  can  in  fact  occur. 
Both  and  will  ignore  contradictions  that  may  occur  for  some  attribute  Z 
where  X-*Z  but  ZfZY.  In  the  many  relation  case  considered  in  the  next  section, 
we  will  see  that  the  derivations  may  be  less  defined  than  the  function. 

We  end  this  section  by  stating  a  fact  about  derivation  trees.  Whenever  F 
allows  derivation  trees  to  be  built  within  which  an  attribute  may  label  more  than 
one  node  on  some  root  to  leaf  path,  then  F  allows  the  construction  of  infinitely 


4  -  13 


many  derivation  trees.  (This  occurs,  for  example,  when  F-[A-*B,  B-*A].)  Those 
trees  in  which  no  attribute  appears  more  than  once  on  a  root  to  leaf  path  are 
called  bounded  trees;  there  are  clearly  only  finitely  many  of  them. 

Proposition  4.2.  For  any  satisfying  instance  I  and  dependency  fiX-*Y,  if 
fl{x)  is  defined  then  fi(x)=(ip(I))(x)  where  ip  is  the  expression  for  some 
bounded  derivation  tree  for  /. 

Proof.  By  proposition  4.1,  //(x)  is  computed  by  some  derivation  expression. 
Say  (’pi(I))(x)=fj(x)  where  Tp'  comes  from  a  non-bounded  tree.  We  construct  ip* 
whose  tree  has  at  least  one  fewer  root  to  leaf  path  on  which  an  attribute  repeats 
and  show  (0J(/))(x)  =  (0l(/))(x). 

The  construction  is  straightforward.  Choose  Vj,  V2,  identically  labelled 
nodes  on  some  root  to  leaf  path,  with  an  ancestor  (i.e.,  closer  to  the  root)  of 
v 2  such  that  no  vertex  on  the  path  from  the  root  to  V;  is  labelled  identically  to 
any  vertex  on  the  path  from  to  the  leaf.  is  the  expression  of  the  tree 
formed  by  replacing  the  subtree  rooted  at  Vj  with  that  rooted  at  vz.  The  equality 
follows  from  the  equality,  at  z,  of  the  expressions  for  the  subtrees  rooted  at  V\ 
and  v 2*  By  part  lii  of  lemma  4.3,  were  these  subexpressions  not  equal  at  z,  I * 
would  be  0.  ■ 

Observe  that  at  values  of  X  for  which  fj  is  not  defined,  it  may  be  that  a  non- 
bounded  expression  may  return  a  result  not  returned  by  any  bounded  expres¬ 
sion.  Sets  of  dependencies  exist  such  that  for  any  fc>0,  an  instance  satisfying 
the  given  set  may  be  constructed  in  which  k  distinct  results  are  returned  by  k 
different  expressions  for  the  same  dependency. 

Example  4.4.  Let 

R=XqX  iX2X  3X4X5 
F  =  \X X 3  >A', 

A  lA  2~‘X 


4  -  14 


For  ease  of  construction,  for  each  i  let  c£om(Ari)=N,  the  natural  numbers. 
Let  /  be  XqX^'^X^  one  of  whose  non-height-bounded  trees  is 


Consider  the  instance  of  R  given  by 


/ 


*0 

*1 

Xz 

*3 

*4 

x% 

0 

0 

0 

0 

0 

1 

1 

0 

0 

1 

0 

1 

• 

2 

2 

1 

0 

• 

• 

k- 1 

0 

fc-1 

k 

k 

k- 1 

0 

in  which  the  blanks  represent  unique  values.  Therefore  I  has  2k  +  1  tuples, 
the  first  of  which  contains  0  inJfo^j.  The  i  +  lst  contains 


•  for  i  odd 


in  0  in  X& 


•  for  i  even  in  X{X3  1  in  X4,  0  in  X6. 

C  C 

The  reader  may  wish  to  verify  that 

(«'(/))(000)=l;  (V'2(/))(000)  =  2 

and  that,  if  Tjt1  is  the  tree  formed  by  adding  l  —  1  copies  of  1 3  to  tp  ,  then  for 
ISk 


(it‘(O)(000)=i 


4  -  15 


4.3.  Functional  Dependencies  in  Multi-relation  Databases 

We  now  take  up  the  behaviour  of  functional  dependencies  in  databases  con¬ 
taining  more  than  one  relation.  In  [H],  Honeyman  gives  an  algorithm  based  on 
the  algorithm  of  Downey  et  al.  [DST]  for  chasing  a  tableau  under  fd’s  only.  The 
algorithm  has  time  complexity  O(nlogn)  where  n  is  roughly  the  number  of 
tuples  in  the  state.  Our  interest  is  in  the  functions  represented  in  a  database 
state,  to  which  we  now  turn. 

(Representation-2)  Let  R  be  a  schema  and  F  a  set  of  fd’s.  Let  f  :X-*Y  bm  a 

dependency  in  F  or  derivable  from  it.  Let  p  be  a  state  for  R.  Then  for  x  any 

X-value  and  y  a  Y-value,  / p{x)-y  if  and  only  if 

i)  there  exists  some  weak  instance,  w,  for  p  in  which  fw{x)-y  and 

ii)  for  any  weak  instance  w’  for  p,  either  fW'{x)-y  or  fw  is  undefined  at  x. 

Otherwise  f  p  is  undefined  at  x.  ■ 

The  functions  represented  in  a  database  state  map  values  in  their  domains 
to  results  which  are  required  by  the  information  in  the  state.  By  this,  definition, 
non-satisfying  states  represent  only  empty  functions. 

This  definition  does  not  suggest  an  effective  means  of  calculating  the  func¬ 
tions.  The  method  used  in  the  single  relation  case  may  be  adapted  for  use  in  the 
multi-relation  case  through  the  use  of  the  tableau  for  the  state. 

Lemma  4.4.  Let  R  be  a  scheme;  F  be  a  set  of  fd’i.  Let  p  be  a  satisfying  stats 

for  R  and  let  f\X-*A  be  a  dependency  in  F  or  derivable  from  it.  Let  x0  be  an  X- 
value.  Let  T  be  Tp  to  which  a  row  is  added  containing  X-value  x0  and  new,  distinct 
non-distinguished  symbols  everywhere  else.  Then 

fp(xo)  =  ((P/(T*))(xo) 

where  T *  is  the  completion  of  T  with  respect  to  F  and  /p(x0)  is  undefined  if 
or  (^/(T*))(xc)  contains  a  non-distinguished  symbol.  • 


4  -  16 


Example  4.5.  Consider  the  system  defined  by: 

F^X^Yi,  Xz->Yz,  YiYz-Xx,  YtYz^Xz  XxXz^Z\ 
R=f*i Yx,  XzYz,  Y  \YzZ\ 

Let  p  be  the  state  given  by 


*1 

Y  i 

*1 

Vi 

*2 

Ye 

*2 

V  2 

Y  i 

Yz 

Z 

V\ 

Vz 

z 

Let  f  be  the  dependency  XxXz-+Z.  We  may  use  lemma  4.4  to  compute 
fp{x^xz).  T  is 


*1 

*2 

Y 1 

Yz 

Z 

1 1 

*1 

y\ 

bz 

b  3 

tz 

*4 

*2 

b 

yz 

b e 

1 9 

67 

60 

Vi 

Va 

z 

t  4 

*1 

*2 

b  0 

b  10 

b  11 

A  transformation  sequence  for  T  with  respect  to  F  is 

<X\-*Y\, 

<XZ~*YZ,  [  f  2>  ^4 

\  Y 2~*X j, 

<r1r2->^2,  | 

<X\Xz-*Z,  \tz, t^]> 

which  sets  i>n  =  z.  Therefore  (p /(T'))(x  ixz)=z  -fD{x\xz).  This  example 
justifies  our  interest  in  calculating  functions  on  values  not  present  in  a 
database  state.  It  does  not  seem  reasonable  to  believe  that  any  state  of  R 
"contains"  any  X \XZ  value.  Nonetheless,  R  has  states  in  which  f  is  defined.  ■ 

In  proposition  4.1  we  saw  that  for  the  single  relation  case,  a  function  is  no 
more  defined  than  its  derivation  expressions.  We  now  demonstrate  that  this  is 
not  true  in  the  multi-relation  case.  The  derivation  expressions  can  only  be 
applied  to  a  satisfying  single  relation.  The  tableau  Tp  is  the  natural  candidate. 
The  prior  example  demonstrates  the  falseness  of  proposition  4.1  for  the  multi¬ 
relation  case,  as  the  only  derivation  expression  for  X\Xz->Z  with  respect  to  F  is 
just  the  simple  select,  project  expression,  p.  However,  this  result  depends  upon 
our  choice  for  F.  The  proposition  gives  an  example  insensitive  to  the  choice  of 


4-  -  17 


cover. 

Tor  the  fd.  f:X+Y,  an  expression  will  be  considered  undefined  in  T@  at  9 

unless  (if' j{.T*p)){x)=y  for  some  constant  T-value.  Similarly,  (ty( Tp))(x )  is  said 
to  be  undefined  unless  it  contains  exactly  one  constant  T-value  and  when  defined 
will  be  said  to  be  equal  to  that  value. 

Proposition  4.3.  There  exists  a  schema  R  having  a  state  p  such  that  a  func¬ 
tion  gp  for  g^F+  is  more  defined  than  ^ g(T*p ).  This  is  insensitive  to  the  choice  of 
cover  for  F  with  respect  to  which  is  defined. 

Proof.  We  give  an  example  of  such  a  system.  Let 
F=[X^X 4,  X3-*X^  XJ^Xy,  X ±X q-*X y] .  Let  R=iX^X^X^Xy,  XzXsXs.  XxX2] 

Let  the  state  p  be  given  by 


X2 

*8 

*6 

0 

0 

0 

Xi 

*2 

0 

0 

*3 

*4 

*5 

*7 

0 

0 

0 

0 

Tp  is  given  by 


Xi  x2 

X3 

X  4 

*5 

X6 

X  y 

0 

0 

0 

0 

0 

b  1 

0 

0 

b  2 

0  0 

*1 

where  the  blanks  stand  for  non-distinguished  symbols  which  are  not  repeated. 
Let  g  be  X iX^X^Xy.  There  are  two  derivations  of  g  from  F:  one  uses  X\-*Xt,  the 
other  X2->X±.  Using  lemma  4.4,  we  can  calculate  yp(00)=0.  But  (^g(Tp))(00)  is 
undefined.  The  derivation  through  Xi~*X4  returns  b2-  The  derivation  through 
X 2-^X4  returns  0.  Proofs  of  these  contentions  are  left  to  the  reader. 

We  can  prove  the  result  to  be  insensitive  to  the  choice  of  cover  for  F,  by 
showing  that  F  is  unique;  that  is,  for  any  set  of  fd’s  C  such  that  the  right  hand 
side  of  each  dependency  in  G  is  a  single  attribute,  G=F  and  G  non-redundant 
implies  G  -F. 


4  -  18 


So  let  G  be  non-redundant  and  equivalent  to  F.  We  know  C| — Xi~*X4.  So 
there  is  some  fd  In  G  with  left  hand  side  Xx,  since,  by  inspection,  none  of  the 
inference  rules  decrease  the  left  hand  side  of  any  fd.  X4  is  the  only  attribute  in 
the  closure  of  X{  wrt  F  (other  than  X j  itself).  So  X  i~*X  ^G.  Similar  arguments 
hold  for  Ar2-*Ar4,  Ar3-»Ar4.  Now  note  that  each  of  X±  Xb  determines  only  itself  in 
F*.  Therefore  the  derivation  of  X^X^Xj  from  G  does  not  proceed  by  pseudo¬ 
transitivity  from  XA->W,  XbW->X7  for  any  W  (and  symmetrically).  So  there  is  an  fd 
X+Xs ~*Y  in  G.  If  Y  is  other  than  X7,  then  G^F.  The  same  argument  shows 
X+Xq^Xj  is  in  G. 

Since  F£C,  G=F  and  G  is  non-redundant,  G=F  as  claimed.  ■ 

4.4.  Representation  by  a  Schema 

In  the  previous  section  we  gave  a  definition  of  function  representation  in  the 
state  of  a  multiple  relation  schema.  We  now  discuss  what  it  means  for  the 
schema  to  represent,  or  fail  to  represent,  a  function. 

(Representation- 3)  Let  R  be  a  schema  and  F  a  set  of  fd's.  Let  /  be  a  depen¬ 
dency  in  or  derivable  from  F.  We  say  that  R  represents  f  if  there  exists  a 

state  p  for  R  such  that  f  p  is  not  the  empty  function.  ■ 

Consider  the  contrapositive  of  this  definition.  If  in  every  state  p  for  R.  f  p  is 
defined  nowhere,  then  surely  it  is  reasonable  to  state  that  R  does  not  represent 
/.  Therefore  any  reasonable  definition  must  be  at  least  as  strong  as  this  one  and 
the  results  of  this  section  are  implied  by  any  such  definition. 

The  main  result  of  this  section  is  that  a  schema  represents  all  the  functions 
of  interest  exactly  when  it  is  dependency  preserving  with  respect  to  them. 
Before  we  can  present  the  result,  we  need  to  make  some  preliminary  definitions. 

Associated  with  any  schema  there  is  an  expression  called  the  projection- 
join  mapping  of  the  schema.  If  R=f/?t,  .  .  .  ,  Rk  \  is  a  schema,  the  associated  map- 


4  -  19 


ping,  denoted  mg,  is  given  by 

rnj^XI.  •  •  •  *nRk(I)) 

where  /  is  an  instance  of  the  universal  relation  for  R.  Thus  mg  is  a  mapping  from 
universal  instances  to  universal  instances.  R  is  a  dependency  preserving  schema 
if  for  any  satisfying  instance,  I,  mR(I)  is  a  satisfying  instance. 

A  tableau,  Tm  ,  may  be  constructed  for  the  projection-join  mapping  associ- 

ated  with  a  schema.  This  tableau  has  a  row  for  each  scheme  in  R  with  dis¬ 
tinguished  symbols  in  the  R,-columns  and  non-distinguished  symbols  everywhere 
else,  no  non-distinguished  symbol  appearing  more  than  once  in  the  tableau.  A 
dependency  f\X-+Y  is  said  to  be  embedded  in  a  tableau  if  there  is  some  row  of 
the  tableau  containing  only  distinguished  symbols  in  the  XY -columns.  The  fol¬ 
lowing  theorem  is  proven  in  [BMSU]. 

Theorem  4.1 .  The  following  are  equivalent: 

i)  A  schema  R  preserves  a  set  of  dependencies  F 

ii)  Tnu  embeds  some  non-redundant  cover  of  F 
Hi)  embeds  every  non-redundant  cover  of  F.  ■ 

From  this  theorem  we  can  immediately  see  that  any  schema  which  is  either 

cover-embedding  or  a  lossless  decomposition  is  dependency  preserving.  (A 
schema  is  a  lossless  decomposition  if  r;.  contains  a  row  of  only  distinguished 

symbols.  In  this  case  for  all  satisfying  universal  instances  /,  mj(/)=/.) 

The  following  states  a  basic  relationship  between  an  arbitrary  state  and  the 
tableau  Tm  . 

Lemma  4.5.  Let  t,  u  be  rows  of  x(Tp)  for  a  state  p  of  a  schema  R  and  some 
transformation  sequence  x •  Let  t  correspond  to  a  tuple  from  relation  T,  u  one 


from  relation  U.  Let  A  be  an  attribute. 


4  -  20 


1)  If  fjyl]=u[A]  then  in  T^u  T\A]-U[A  \  where  the  relation  names  are  used  to 
denote  the  rows  of  representing  them. 

2)  If  f[A]  is  constant  then  TjA]  is  distinguished. 

Proof.  We  show  that  any  transform  sequence  on  a  database  state  tableau 
can  be  carried  out  in  the  tableau  Tm B-  Let  <h,  \r,  s]  >  be  a  transformation  of  the 

sequence  where  r  is  from  relation  R .  s  from  relation  5\  Convert  this  to 
<h,  [R,S]>.  Note  that  if  R=S,  this  is  a  null  transformation  and  if  it  makes  the 
premise  of  1  or  2  true  the  consequence  will  also  hold. 

We  prove  the  lemma  by  induction  on  the  length  of  the  transformation 
sequence  preceding  the  transformation  making  the  premise  of  1  or  2  true. 

Basis  The  first  transformation  in  the  sequence  involves  an  fd  whose  left 

hand  side  is  a  subset  of  one  of  the  relation  schemes  of  R 

Induction  Assume  the  lemma  holds  for  prefixes  of  length  no  greater  than  m. 

Assume  a  prefix  of  length  m+1.  We  need  to  show  that  the  m+2’nd  transforma¬ 
tion  becomes  enabled  and  if  it  makes  the  premise  of  property  1  or  2  true  in  T p, 
then  the  consequence  will  hold  in  T^u. 

Let  t=<W -+B,  [r,s\>  be  the  m+2’nd  transformation  where  W'r  =  Wr1,-‘  fft. 
Consider  the  transformation,  if  one  exists,  which  set  r  [W^]  If  no 

such  transformation  exists  then  this  equality  holds  in  the  Tp  and  therefore  in 
Tm  since  WxeR,  WxeS.  Assuming  such  a  transformation  does  exist,  it  preceded 

Xk 

r,  has  been  executed  and  R  [  Wl ]  =  S’ [  Wx ]  by  induction  after  its  execution  (and 
possibly  before).  So  t  becomes  enabled  in  Tma •  Now  assume  t  has  the  effect  of 
setting  f  [i?  ]=zi  [5  ].  Then,  possibly  after  renaming,  ryB\-t'\_B\  and 
before  execution  of  r;  that  is,  these  equalities  were  established  by  a  transforma¬ 
tion  which  preceded  t.  Therefoie,  by  induction,  R  [B  ]  =  T  [B  ]  and  5  [Z?  ]  =  V 7  [B  ] 
and,  after  r  is  executed  in  Tm  ,  T[B  ]=[J\B  J  (and  possibly  before). 


4  -  21 


Now  assume  r  sets  the  B-value  of  some  row  to  a  constant.  This  constant 
appeared  in  one  of  r[£?]  or  s[5].  Thus  R[B ]  (or  5 [5])  is  distinguished  by  induc¬ 
tion.  Thus  property  2  holds  after  r  is  executed  in  Tmu  (and  possibly  before).  ■ 

For  a  schema  R,  the  tableau  Tx  is  the  tableau  7,mR  to  which  a  row  is  added 

containing  only  distinguished  symbols  in  the  X-columns  and  new,  non- 
distinguished  symbols  everywhere  else.  This  added  row  will  be  called  the  X-row. 

Lemma  4.6.  A  schema  R  represents  a  dependency  fJC-*Y  in  or  derivable 
from  a  set  of  fd’s  F  if  and  only  if  in  Tx  the  Y-value  in  the  X-row  is  all  dis¬ 
tinguished  symbols. 

Proof.  ( Only  if)  This  is  an  easy  consequence  of  lemma  4.5.  Tx  is  the 
tableau  of  the  projection-join  mapping  of  the  schema  RijjJTj.  The  procedure  of 
lemma  4.4,  which  calculates  the  value  of  a  function  at  a  point,  is  the  chase  of  a 
state  of  this  schema.  ■ 

(If)  Consider  a  state  p  for  R  which  is  the  set  of  projections  of  an  instance  of 
the  universal  relation  containing  a  single  tuple,  t.  We  claim  fp(t  [X])=t  [F], 

Let  T  be  Tp{j\tx\  where  fr[Ar]  =  f[Ar]  as  in  lemma  4.4.  T  may  be  considered 
to  be  the  tableau  Tx  if  every  constant  is  mapped  to  a  distinguished  symbol: 
there  will  be  at  most  one  distinguished  symbol  in  every  column.  By  hypothesis, 
T*  contains  a  row  containing  only  constant  values  in  the  XY-columns.  Since  the 
only  Y-value  in  T  is  t  [ Y ].  the  claim  is  established.  ■ 

It  is  not  difficult  to  show  that  dependency  preserving  schemas  satisfy  the 
conditions  of  lemma  4.6  for  all  dependencies  in  or  derivable  from  F.  Such  sche¬ 
mas  represent  any  dependency  appearing  in  any  non-redundant  cover,  as  a 
consequence  of  theorem  4.1.  For  a  dependency  X->Y,  not  in  any  cover,  represen¬ 
tation  is  an  immediate  consequence  of  the  following  proposition  and  the  fact 
that  any  superset  of  a  dependency  preserving  schema  is  dependency  preserving. 


4  -  22 


Proposition  4.4.  ([BMSU],  proposition  1)  Let  R  be  a  dependency  preserving 
schema.  Let  X  be  the  set  of  attributes  of  a  row  r  of  TmR  containing  distinguished 

symbols.  Then  the  row  of  corresponding  to  r  has  distinguished  symbols  in 
the  attributes  X+  =  \A  |  X->A€.F+\  and  no  attribute  of  r  outside  of  X*  repeats  in 
(X*  is  called  the  closure  of  X  and  is  called  a  closed  set  of  attributes.) 

We  will  now  show  that  dependency  preserving  schemas  are  in  fact  the  only 
schemas  which  satisfy  the  conditions  of  lemma  4.6  for  all  dependencies. 

Let  X -*A  be  a  dependency  in  F  not  embedded  in  T*n R,  but  embedded  in  T y. 
We  may  assume  without  loss  of  generality,  that  no  row  of  has  only  dis¬ 

tinguished  symbols  in  the  X-columns.  Assume  otherwise.  Note  that  Ty  may  be 
formed  by  chasing  T*  where  ty  is  the  X-row.  If  some  row  of  7*  has  all 

distinguished  symbols  in  the  X-columns,  then  7™R  (J  [ ty\  contains  7^ R  in  the 

tableau  containment  sense.  Since  the  chase  preserves  the  containment  rela¬ 
tionship  and  X -*A  is  embedded  in  Ty,  it  must  be  embedded  in  ,  a  contradic¬ 
tion.  Assume  however  that  X-+A  is  embedded  in  Ty. 

We  may  begin  the  chase  of  Ty  by  first  chasing  the  rows  of  7^,.  transforming 

XI 

them  to  7^jr.  Let  x  be  any  sequence  of  transformations  with  respect  to  F  which 
completes  the  calculation  of  Ty  from  this  point.  We  wish  to  distinguish  two  types 
of  transformations. 

i)  A  transformation  is  of  type  i  if  it  equates  a  non-distinguished  symbol  in  the 
X-row  to  some  other  non-distinguished  symbol. 

ii)  A  transformation  is  of  type  ii  if  it  equates  a  non-distinguished  symbol  in  a 
row  of  T to  a  distinguished  symbol. 

Lemma  4.7.  If  y  makes  any  two  symbols  in  rows  of  T^R  equal  which  were  not 
equal  in  ,  then  \  contains  a  transformation  of  type  i  or  ii. 


’IV*  •.  ,  y  ^  > 


Proof.  Assume  otherwise.  Let  r*<£ -»A.  he  the  first  transformation 

of  x  which  makes  two  such  symbols  equal.  Since  there  are  no  transformations  of 
type  ii  in  y,  1 1[/4  ].  tz[A]  must  both  be  non-distinguished.  Neither  t\  nor  t %  may 
be  the  X-row  by  the  exclusion  of  type  i  transformations,  r  must  not  have 
become  enabled  during  the  calculation  of  T £  ,  else  t  i[A  ]=*  Z\A  ].  Therefore  it 

became  enabled  during  x ■  But  then  two  of  the  symbols  in  t\,  t%  became  equated 
during  x .  violating  the  choice  of  r  as  the  first  transformation  with  this  effect.  ■ 

Lemma  4.8.  The  dependency  of  the  first  transformation  of  type  i  or  ii  which 
appears  in  x  is  n°t  embedded  in  T^B  but  its  left  hand  side  is  embedded. 

Proof.  Let  t=<Z-*A,  [ti,tg]>  be  this  dependency,  if  it  exists.  By  lemma  4.7, 
one  of  t  |,fa  must  be  the  X-row.  Let  be  the  other  row,  corresponding  to  a  row  of 
r^r  Since  there  are  no  prior  trenafermatiooe  ef  type  i,  it  must  be  that  t£2 ]  is 

all  distinguished.  Since  there  are  no  prior  transformations  of  type  m,  all  these 
symbols  must  have  been  distinguished  in  7^.  But  t^A]  is  not  distinguished, 

otherwise  r  would  not  be  of  type  i  or  ii  ■ 

As  we  have  seen  before,  if  Z  is  embedded  in  some  row  of  but  Z  -*C  is  not, 

then  T'z  will  not  embed  Z -*C  and  R  will  not  represent  it.  Therefore  we  may 
assume  that  no  transformation  of  type  i  or  ii  appears  in  x •  Further  we  may 
assume  that  every  transformation  of  x  involves  the  X-row.  Let./'1'  be  the  subset  of 
F  appearing  in  transformations  of  x-  By  lemma  4.3,  F'\ — Ar-»F.  But  X-*Y  can  not 
be  in  F\  since  no  row  of  will  have  X  become  embedded  in  it.  Therefore  X^*Y 

is  not  in  F  or  F  is  redundant. 

We  have  shown  that  a  non-dependency  preserving  schema  fails  to  represent 
at  least  one  functional  dependency  in  every  cover.  The  reader  should  note  that 
it  may  represent  some  of  the  non-embedded  dependencies  in  a  non-redundant 
cover.  In  example  4.5,  the  dependency  X\Xg-*Z  was  shown  to  be  represented. 


4  -  24 


even  though  it  is  not  embedded  in  .  Note  that  the  dependencies  YiY2->Xi 

A 

(1=1,2)  are  not  represented  in  that  example.  We  express  this  result  as  a 
theorem. 

Theorem  4.2.  A  schema  R  represents  all  the  functional  dependencies  in  or 
derivable  from  F  if  and  only  if  it  preserves  the  dependencies  in  F.  • 

We  can  now  prove  a  property  of  dependency  preserving  schemes  which  rein¬ 
forces  the  belief  that  they  are  "good,”  namely  that  when  an  instance  of  the 
universal  scheme  is  stored  as  the  state  of  a  dependency  preserving  schema,  the 
functioned  associations  of  the  decomposed  state  are  exactly  those  of  the 
instance.  We  will  first  prove  a  lemma  somewhat  interesting  in  itself  and  then 
derive  this  property  as  a  corollary. 

livunc  4. 9.  Suppose  R  prweervee  a  set  of  dependencies  F  and  p  is  the  set  of 

e 

projections  of  a  satisfying  universal  instance  /.  For  t  a  tuple  of  I,  let  1 be  the 
valuation  function  taking  all  of  to  that  row.  Every  row  of  Tp  is  the  image  of 

some  row  of  7^  under  a  valuation  function  T)u:TmM~*Tl  agreeing  with  on  dis¬ 
tinguished  symbols,  for  some  u€l.  Further,  is  one-to-one  on  ndv’s,  and 
T]t(v[A  ])=77u(v  [4  ])  where  v[A  ]  is  an  ndv,  only  if  T)t(v  )=-rjZ(v  ). 

Proof.  Note  that  valuation  functions  ut,  T]t  as  described  by  the  lemma  exist, 
where  7)t’’TmR-*Tp.  Since  is  obvious,  we  need  only  prove 

Vt  :  7\nB  T*p  for  each  t.  Let  x(7,mB)=7’mB-  Let  Xrj  be  the  result  of  replacing  each 
transformation  <d,  fr,s  }>  in  \  with  the  set  of  transformations 

\<d,  \r)t(r),  Tjt(s)]>  \t€l\ 

By  lemma  3.1,  rjt-.T^^XvC^ p) •  So  the  lemma  is  true,  when  Xn(T  p)  is  substituted 
for  Tp.  Now  we  claim  that  Xrj(Tp)  =  T*. 

Since  R  is  dependency  preserving,  the  set  of  repeating  attributes  in  each 
row  of  is  closed  and  every  repeating  symbol  is  distinguished,  by  proposition 


4  -  25 


4.4.  Thus  the  set  of  repeating  attributes  of  every  row  of  Xtt (^#)  i*  closed  and 

every  repeating  symbol  is  a  constant.  So  no  change  can  be  made  to  Xt) (Tp)  by 
lemma  4.3.  So  Xv(T p)=Tp,  by  definition.  • 

Proportion  4 . 5.  For  R  a.  I  as  in  the  prior  lemma,  fj  */p  for  all  /  cF*. 

Proof.  Let  /  be  X-+A  and  consider  an  X-value  x.  We  will  show  that  for  each 
transformation  seqence  £  valid  for  T*{j[tx]  there  is  a  sequence  x  valid  for  Ix 
such  that 

•  either  both  of  ("( Tp  x(f*)  are  empty  or  neither  is; 

•  the  X-rows  of  xUx)  and  f(7’pljffxj)  are  identical  up  to  renaming  of  ndv*s. 
For  convenience,  we  denote  both  X-rows  as  tx.  We  show  the  converse  holds 
as  well,  which  proves  the  proposition  by  lemmas  4.2  and  4.4. 

Let  <W-*B,  [u,tx]>  be  any  transformation  enabled  in  T*p\j[tx\.  Then  for 
some  v  in  7^  and  some  tel,  gt{y)-u.  So  <W-*B,  [t,tx\>  is  enabled  in  Ix,  since 

u[W ]  must  be  constants.  By  induction,  this  reasoning  may  be  applied  to  each 
transformation  of  a  sequence  for  T*p\j[tx\.  At  no  point  during  such  a  sequence  is 
any  row  of  Tp  modified,  as  the  set  of  repeating  attributes  in  each  row  is  closed 
and  all  repeating  symbols  are  constants. 

A  similar  argument  applies  to  sequences  of  Ix.  In  this  case,  for  a  transfor¬ 
mation  <W ->B,  \t,lx]>,  there  is  at  least  one  row  v  e  7\^H  with  all  distinguished 
symbols  in  WB  so  <  W  ->B,  \gt(v  ),tx\>  may  be  used  in  the  corresponding  chase  of 

4.5.  Functional  Dependencies  as  Constraints 

We  have  shown  that  a  schema  may  represent  functional  dependencies  which 
are  not  embedded  in  any  relation  of  the  schema.  This  has  salutory  effects  on 
schema  design.  It  has  long  been  known  that  there  exists  sets  of  dependencies 
for  which  no  cover-embedding  schema  may  be  found  each  of  whose  relations  is 


4  -  28 


in  Boyce-Codd  Normal  Form  (BCNF).  It  has  also  Wan  known  that  lossless  dacoea- 
positions  in  arbitrarily  stringent  normal  forms  can  always  be  found  for  any  set  of 
dependencies.  (BCNF  is  the  strongest  normal  form  when  only  functional  depen¬ 
dencies  are  considered.)  Theorem  4.2  suggests  that  a  lossless  decomposition  is 
"good  enough".  We  present  further  evidence  for  this  belief. 

Example  4.6.  Consider  R-ABC,  F  =  {AB  -*C,  C->B],  No  BCNF,  cover- 
embedding  decomposition  of  R  exists.  However  the  schema  R=fi4C,  BC]  is 
lossless  and  therefore  dependency  preserving.  Therefore  R  represents  the 
dependency  AB  ->C  which  it  does  not  embed.  States  for  R  may  be  con¬ 
structed  which  are  not  satisfying  precisely  because  they  violate  A B  -»C. 


This  state  is  locally  satisfying.  However  the  reader  may  convince  himself 
using  the  techniques  of  the  prior  sections,  that  no  weak  instance  exists  for 
it.  The  state  is  the  projection  of  a  universal  instance  containing  the  two 
tuples  a  16  ic  i  and  a  j6  \Cz-  This  instance  does  not  satisfy  F.  ■ 

Expanding  on  example  4.6,  we  would  like  to  know  under  what  circumstances 
a  dependency  has  the  power  to  act  as  a  constraint  on  database  states.  If  a 
schema  represents  the  dependency  X-+Y,  then  it  is  capable  of  assigning  a  Y- 
value  to  any  X-value.  It  seems  reasonable  to  believe  that  such  a  schema  is  capa¬ 
ble  of  assigning  more  than  one  distinct  Y-value  to  a  given  X-valuc.  A  state  of  the 
database  in  which  this  occurs  is  illegal.  However,  we  can  show  that  dependen¬ 
cies  which  are  not  represented  may  still  act  to  constrain  the  set  of  satisfying 
database  states. 

Example  4.7.  Let  F  be  [A^C,  B^C,  CD->E].  Let  R  be  [AB,  BDE,  C  $.Let  a 


A 

C 

a, 

Cl 

a  l 

C  2 

B 

C 

Cl 

C  2 

state  p  be 


4  -  27 


A 

B 

0 

0 

0 

1 

B 

D 

E 

0 

0 

0 

1 

0 

1 

with  p(C)=0.  p  is  locally  satisfying  but  not  satisfying  as  the  reader  may  ver¬ 
ify.  The  reader  may  also  verify  that  no  dependency  in  F  is  represented  by  R. 
However,  if  any  dependency  is  removed  from  F,  p  has  a  weak  instance  satis¬ 
fying  the  remaining  dependencies.  ■ 

If  R  is  a  relation  scheme,  it  is  customary  to  define 

SAT  (R,F)  =  [r  j  r  is  an  instance  of  R  satisfying  F] 

By  analogy,  if  R  is  a  database  scheme  we  define 

SATW(RF)^\p  |  p  is  a  state  of  R  having  a  weak  Instance  wrt  F\ 

We  may  now  define  what  is  meant  by  a  dependency  acting  as  a  constraint. 

(Constraint.)  Let  R  be  a  database  scheme;  F  a  set  of  fd’s.  Let  /  be  a  func¬ 
tional  dependency.  We  say  that  /  acts  as  a  constraint  on  R  wrt  F  if 

SA T W ( R, F  \J  f / 1) *  SA  TW  ( R, F  -  \  f  J ) 

Note  that  SATW(R,F  \j\f])CSATW(R,F-[f  ])  holds  for  all  R,  F,  f.  The 
definition  implies  that,  when  f  constrains  R,  chasing  states  of  R  using  all  the 
dependencies  in  F  \j\f  ]  gives  a  different  yes/no  result  for  some  states  than  is 
given  by  not  using  f.  Thus  if  f  does  not  act  as  a  constraint  on  R  it  does  not  affect 
the  determination  of  satisfaction  for  any  state  of  R. 

From  the  definition,  we  see  immediately  that  if  F  —  \f  — /,  f  does  not  act  as 
a  constraint.  This  is  an  application  of  the  easily  proven  fact  that  for  sets  of 
dependencies  F  and  C.  F=C  implies  SA  FIf  (R,F)  =  5’/471Wr(R,  C )  for  any  R.  The  con¬ 
verse  of  this  statement  is  false,  as  we  show  by  way  of  example. 

Example  4  8  Let  F  he  \A  ><7|.  Let.  R  be  [AB,  BC\.  From  lemma  4.6  we 
deduce  that  R  does  not  represent  .4  -»C.  Therefore  the  chase  of  the  T-map 
for  any  state  of  R  never  produces  a  tuple  -with  a  constant  AC-value.  Thus  no 
such  state  contradicts  .4  -*C.  In  short 


4  -  20 


SATIT(R,  \A-+C))=SATW(R  0) 

and  every  state  of  R  is  satisfying.  • 

We  now  present  a  sufficient  condition  for  a  dependency  to  act  as  a  con¬ 
straint.  We  will  then  present  a  necessary  condition.  The  two  conditions  are  simi¬ 
lar  but  not  identical.  We  end  the  section  by  demonstrating  that  acting  as  a  con¬ 
straint  is  a  "cover-sensitive”  property  of  dependencies. 

Assume  that  F  is  non-redundant  and  f  £F.  Define  the  contribution  of  f  wrt  F 

by 

C0TdrF(f)mF+-(F-\f\)+ 

An  element  of  contrF(J)  Is  a  dependency  all  of  whose  derivations  use  /.  Let  d  be 
a  vector  of  distinguished  symbols;  the  length  of  the  vector  may  be  deduced  from 
the  context  in  which  It  appears. 

Theorem  4.3.  Let  R  be  a  schema  and  F  a  non-redundant  set  of  fd’s.  A 
dependency  f  <LF  acts  as  a  constraint  on  R  Lf  tnere  exists  a  dependency  g  in 
contrF{f)  such  that  (*g  (T^))(d)=a. 

Proof.  Let  g  €contrF(f )  be  L-*B  and  let  (<i'g(T^nR))(d)=a.  Construct  xu,  a  two 
tuple  universal  instance.  Let  these  tuples  be  tx,  tz  and  let  tx[A  ]  =  f2[A  ]  precisely 
when  L  -*A  €(F-\f  $)+.  Let  p  be  the  set  of  projections  of  w  onto  the  schemes  of  R. 
We  show  that  p  witnesses  the  fact  that  f  acts  as  a  constraint  on  R.  That  is,  we 
show  p€  SA  T  W  (R  F  -  \f  |  )■ -  SA  T  W  (R,  F ). 

Part  1.  peSATW(RF-\f\). 

We  show  w  is  in  SAT(R,F-\f  ]).  Let  h:M  -*C  be  any  fd  in  F  —  [f\.  If 
ti[M]=tz[M]  then  L  -*M  z(F  - \f  })+  so  L-*Ce(F-\f])+  and  tx[C]  =  t2[C]  by  con¬ 
struction. 

Fart  2.  pjtSA  TW (RF) 

Assume  to  the  contrary,  that  p  satisfies  F.  We  note  that  T p  contains  two  iso- 


4  - 


morphic  images  of  Tm ^  namely,  the  projections  of  each  of  1 4  and  t Let  these 

two  images  be  7’1  and  Tz.  Since  7^*0,  T*p  contains,  in  the  set  theoretic  sense, 
the  two  images  7’*.  Tz  of  7^.  Since  (^g(T^M))(d)=a  then  for  some  derivation  we 
know  (1't(TnR)){d)=a.  So  Wg(T\  ))(*  i[L  ])=*  i[B  ]  and  (lf}(Ti))(t Z[L  ])=tz[B  ]. 

That  is  to  say,  |  ))(t  i[L  ])  |  >1  contradicting  lemma  4.2.  It  must  be  that  p  is 

not  satisfying.  ■ 

Although  the  condition  of  theorem  4.3  is  easy  to  state,  it  is  difficult  to  test. 

Proposition  4.6.  The  test  to  decide  if  an  fd  satisfies  the  criterion  of  theorem 
4.3  is  NP-hard. 

Proof.  The  following  reduction  1*  borrowed  from  Beeri  and  Honey  man  [BH]. 

The  hitting  swt  problem  is  defined  as  follows:  Let  5*  . .  .  ,  Sn  be  finite  seta.  B 
is  a  hitting  set  iff  j  H  f\S\  |  *  1  for  each  i.  Hitting  set  was  first  demonstrated  to  be 
NP-complete  by  Karp  [K]. 

We  transform  hitting  set  to  th&  test  for  theorem  4.3.  Let 

. Let  LmiP»lfAi^  |  for  nomm  S*. 

Any  set  oontaining  an  element  of  P  cannot  be  a  hitting  set. 

Let  the  schema  R =\SC  \J  ( ABi  |  A  j  for  each  A  for  each  S*.  Let  F  con¬ 

sist  of  the  three  sets 

F i=^4  ~*Bi  j  for  each  A  for  each  5"^ 

Fz=\AiAj-*C  |  for  each  \AitAj]^P\ 

F3=\B,-  ■  Bn-*C\ 

Let  g  :Q  -*C  for  Q  <ZS.  If  g  €F+  then  clearly  (^g(T^nR))(d  )=a.  Let  /  be  7?  i  •  •  •  Bn^>C. 
We  claim  g  Ccontrp(f )  ifT  Q  is  a  hitting  set. 

Suppose  Q  is  not  a  hitting  set.  If  Q  contains  some  element  of  P,  then  g  can 
be  derived  from  some  element  of  Fz  by  augmentation.  Otherwise,  Q  does  not 

^Compare  lemma  4.9. 


4  -  30 


intersect  some  S’*  and  g£F+. 

Now  suppose  Q  is  a  hitting  set.  Clearly  gcF*.  We  claim  Only 

elements  at  F%  contain  C  on  the  right.  No  element  of  F  j  contains  any  A*  on  the 
right.  So  the  claim  is  established.  • 

It  is  not  clear  whether  the  test  of  theorem  4.3  is  in  NP.  The  membership  of  g 
in  contTf(f)  may  be  tested  in  time  linear  in  I/1!  by  the  membership  algorithm 
of  Bernstein.  However,  as  a  consequence  of  proposition  4.3.  testing 
(♦y(r^s))(d)  =  e  may  require  executing  each  fj.  However,  the  test  Is  decidable 

as  it  references  a  fixed,  finite  universal  Instance  T^M.  (Contrast  this  with  the 

case  of  example  4.4.) 

Theorem  4.4.  For  database  schema  R,  F  a  non-redundant  set  of  fd’s,  if  a 

dependency  f  eF  acts  as  a  constraint,  then  there  is  a  dependency  geF+  and 
some  derivation  tree  Dt t  constructed  using  /,  such  that  pg(T^ln)){d)  =  a. 

Proof.  If  /  eets  ea  e  eenetrekit,  thee  there  eeAete  e  witaeee  te  theft,  feet,  i.e~, 

an  element  p  of  547TF(R,/,-|/  n-5L47TT(R/’).  Let  *(7^)=^,  We  analyze  x  with  the 
goal  of  finding  a  derivation  tree  in  which  /  is  used  and  then  show,  using  lemma 
4 A  that  the  expression  for  this  tree  returns  the  distinguished  symbol  on  the 
vector  of  distinguished  symbols.  -  - 

Let  f  be  some  proper  prefix  of  Consider  two  rows  r,  s  of  f (Tp)  such  that 
for  some  attributed,  r[d]  =  s[d].  The  set  of  transformations  r("r  [A  ]=s  [A  ]")  is 
the  subset  of  \  directly  responsible  for  this  equality.  If  the  equality  holds  in  Tp, 
then  r("r  [A  ]=s[d  ]”)=0.  Otherwise,  there  is  a  unique  transformation 
u=  <X -*A,  \t,u  |>  in  £  such  that  before  its  execution  r  [A  ]*s  [A  ]  and  after  its  exe¬ 
cution  r[d]  =  s[d].  This  requires,  possibly  after  renaming,  that  r[d]=f[d]  and 
s[d  ]=-u[d  ]  hold  before  execution  of  v  in  Then 


r("r[d]=s[d]'')=l^ur('v[d]=f[d]")ur("s[d]=u[d]") 


4  -  31 


Let  r  =<Z-*C,  [v,w]>  be  the  last  transformation  of  x>  Le.,  t  is  a  contradic¬ 
tion.  Let  v[C]=C!,  i u[C]=c2  just  before  execution  of  t.  The  sets  T('fv[C ]=cj"), 
r("u»[C]=cg")  can  be  defined.  We  can  now  superimpose  a  directed  graph  on  x* 

The  directed,  arc-labelled  acyclic  graph  CT(x)  has  a  node  for  each  transfor¬ 
mation  of  x  and  the  nodes  1  and  T  .  The  arcs  of  CT(x)  are  given  by 

•  an  arc  labelled  C  is  directed  from  T  to  r  and  to  each  transformation  in 

r("v[c]=c")ur("^[c]=c2"). 

•  from  a  transformation  <Y  x  ■  •  •  Yk-*B,  \x,y\>  in  x.  for  each  ie[l,  .  .  .  ,  k  an 
arc  labelled  Yi  is  directed  to  each  element  of  r("x[Ti]=T/[yi]'').  If  this  set  is 
empty,  then  an  arc  labelled  Yi  leads  to  1 . 

The  acyclicity  of  CT(x)  is  apparent:  no  arc  leads  to  T  nor  from  1  and  if 
<v,  £>  is  an  arc  then  £  precedes  v  in  x- 

One  can  establish  by  an  easy  induction  that  the  set  of  all  paths  from  T  to  1 
contains  that  subset  of  x  which  is  necessary  to  reduce  Tp  to  0.  Since  p  is  a  wit¬ 
ness  for  /,  one  of  these  transformations  must  use  f. 

Now  consider  any  set  H  of  paths  from  T  to  1  in  CT(x)  which  satisfies  the  fol¬ 
lowing  criteria: 

i)  every  path  in  H  begins  with  the  same  arc 

ii)  for  every  t)=<Y j  ■  •  •  Yk->B,  [x,y\>  on  some  path  in  H,  exactly  k  of  the 
arcs  leaving  77,  no  two  labelled  the  same,  are  on  paths  of  H. 

H  corresponds  in  a  natural  way  to  a  derivation  tree:  Let  H  be  the  graph  formed 
from  H  by  letting  the  nodes  of  H  be  the  arcs  of  H,  labelled  accordingly,  and  an 
arc  lead  from  node  nt  to  node  n2  in  H  if  the  node  which  arc  nx  of  H  enters  is  the 
node  from  which  arc  n2  of  H  leaves.  H  is  the  line  digraph  of  H  [Ha].  (For  H  to  be 
a  tree,  node  splitting  may  be  necessary  in  H  for  those  transformations  with  two 

incoming  arcs.*  Assume  therefore  H  is  a  tree  as  well.)  The  dependencies  used  to 

*To  find  all  the  derivation  trees  Ln  C7T(x).  node  splitting  should  be  done  before  selecting  the 
paths  of  H.  However,  here  we  are  searching  for  only  one  tree. 


4  -  32 


form  the  derivation  trea  W  ora  the  dependencies  used  in  the  if  inefdrniiUe—  of 
M.  The  left  hand  side  ef  the  dependency  represented  fey  #  to  the  set  ef  attributes 
Labelling  arce  Incoming  te  1 ;  the  right  hand  side  ie  the  label  on  the  arc  leaving 

T  ,  C  in  this  case. 

As  noted  above,  we  may  assume  H  contains  a  transformation  on  /;  therefore 
H  is  a  derivation  tree  utilizing  /.  Let  V'  be  the  expression  for  H.  It  remains  to 
show  that  (V'(7,^a))(d)  =  a. 

Each  subexpression  nA  (cr^ (/ ))  of  ip  corresponds  to  a  transformation  of  H. 
Ceill  the  set  of  tuples  returned  by  (Tq{I)  during  some  evaluation  of  ip  the  selected 
set  for  the  transformation.  Let  the  height  of  a  transformation  in  H  be  the  length 
of  the  longest  path  from  it  to  1  .  By  induction  on  the  height  of  a  transformation, 
we  show  that  its  selected  set  in  ))(d)  includes  the  rows  of  represent¬ 

ing  the  schemas  of  the  rows  of  T p  in  the  transformation. 

The  basis  references  those  transformations  of  H  enabled  In  Tf.  For 
<X-*A,  \r,s  J>  such  a  transformation,  clearly  X£.R  (~)S  (R  the  scheme  of  r;  5  the 
scheme  of  a)  so  the  hypothesis  holda. 

For  the  induction,  let  rj^<Y i  •  •  •  Yk~*B,  be  at  height  m.  If  the  IVarc 

leads  from  7]  to  <Z^Yit  [ v,w]> ,  then  by  construction  r^Y^-v  [Yt]=w  [Fi]  =  s[Fi] 
in  x(T'p)  so  R  C^i]  =  Vr[y'i]=  Wr[yi]=S’[yi]  in  T by  lemma  4.5.  If  it  leads  to  1  ,  then 
R  [y’i]=5'  [Fi]=a  in  Tm  .  By  the  induction  hypothesis  and  the  expression  for  77, 
\R,  S'!  is  a  subset  of  the  selected  set  for  77.  So  the  induction  is  established. 

Now  consider  the  selected  set  for  the  root  of  ip,  the  expression  for  the  sole 
descendant  of  J  in  H.  The  rows  of  this  transformation  are  constant  in 
p)  on  the  label  of  the  arc  leaving  T  (C  in  our  example).  Therefore  they 
are  distinguished  on  that  attribute  in  the  selected  set,  by  lemma  4.5.  This  com¬ 
pletes  the  proof.  ■ 


4-33 


Ws  may  un  th«  technique®  of  the  proof*  of  theorem*  4.3  and  4.4  to  narrow 
the  f  ep  between  them  eemewhet. 

77i  rorrm.  4.5.  Let  R  be  a  scheme.  F  e  non -redundant  set  of  fd’*,  f€F.  Sup¬ 
pose  there  exists  SCR,  gtF*,  such  that 

D  <*,<*£,»<*>=«  and 

M)  for  erery  derivation  tree  Dti  such  that  (V'f  (T’mg  ))(d  )*«.  /  is  used  in  Dti , 
then  /  constrains  R  with  respect  to  F. 

Proof.  For  each  BeX*,  say  that  f  is  unnecessary  for  B  if  there  is  some 
derivation  tree  Dti  for  X->B  in  which  /  is  not  used  and  ))(£)= a.  Define 

u 

the  set  X  by 

X=\A  |  A  cX*  and  /  is  unnecessary  for  A  j 
We  have  XQXQX*.  Form  a  two  tuple  relation  over  the  universe  which  tuples 

agree  exactly  on  the  set  X.  Let  p  be  the  projection  onto  S  of  these  two  tuples.  We 

will  show  that  p  is  a  witness  for  f. 

We  can  show  that  pj?SATW(R,F)  in  the  manner  used  in  the  second  part  of 
the  proof  of  theorem  4.3.  To  show  that  peSATW (R,F-[f\),  we  will  prove  that  any 
sequence  %  such  that  x(^p)=0  contains  a  transformation  using  f. 

Again  we  exploit  the  two  images  T\,  Tz  of  rmg  in  Tp.  Let  us  separate  these 

images  and  chase  each  individually.  (This  may  cause  some  duplication  of  rows,  if 
some  scheme  of  S  is  a  subset  of  X.  If  this  occurs,  ensure  by  renaming  that 
T\,  T\  share  only  values  in  X.)  Let  ("(T*  {jT\)=<p .  By  the  methods  of  theorem  4.4, 
we  construct  the  graph  G({).  As  in  theorem  4.4,  we  can  extract  from  this  graph, 
derivation  trees  whose  expressions  are  defined  in  7^  at  a  to  be  a.  An  attribute 

labelling  the  leaf  of  any  such  tree  is  the  label  on  an  arc  of  C  (£)  leading  to  1  .  The 
rows  of  the  transformation  from  which  this  arc  emanates  agree  on  this  attribute 
in  FI  { jT\ .  We  claim  that  every  such  attribute  is  an  element  of  X.  We  establish 


4  -  34 


this  claim  by  proving  that  every  transformation  in  f  involves  a  row  of  T\  and  a 
row  of  T*z.  Such  a  pair  of  rows  agree  only  on  attributes  in  X,  by  construction. 

During  the  execution  of  no  two  rows  r.  s  of  T *  (i  =  l  or  2)  become  equal  on 
any  attribute  on  which  they  were  not  already  equal.  They  may  not  become  equal 
on  any  attribute  unless  the  corresponding  rows  of  7^  are  equal  on  that  attri¬ 
bute.  by  lemma  4.5.  As  7<*  is  an  isomorph  of  they  are  equal  on  that  attri¬ 
bute.  Therefore  no  transformation  on  any  two  rows  of  T *  becomes  enabled  dur¬ 
ing  the  execution  of 

We  have  established  our  claim  that  every  derivation  tree  which  we  may 
extract  from  G({)  has  its  leaf  attributes  within  X.  We  therefore  have  expressions 
for  dependencies  of  the  form  Y ->B  for  Y  QX  which  expressions  return  a  at  d  in 
7ma-  Clearly  BeX+-X.  Therefore,  each  of  these  expressions  must  use  /.  There¬ 
fore,  f  must  be  used  in  some  transformation  of  • 

We  end  this  section  by  showing  that  the  property  of  acting  as  a  constraint  is 
'cover  sensitive'. 

Proposition  4.7.  There  exist  non-redundant  sets  of  dependencies  C  with 
f  €.F  P\C  and  a  schema  R  such  that  /  constrains  R  with  respect  to  F  but  not  with 
respect  to  G. 

Proof.  Let  R  and  F  be  as  given  in  example  4.5.  Let  G  be  F  without  X\X2 -*Z 
and  with  YXY2~*Z.  Let  /  be  YxYz~*Xl.  Now  YlY2-*Z  econtrF(Y  x  YZ->X \)  and 
('I/y1y2-.z(7,^lH))(aa )  =  a.  Thus  Y  \Y2~*Z i  acts  as  a  constraint  on  R  with  respect  to  F 
by  theorem  4.3. 

We  claim  that  no  transformation  on  Y\Y2~*X\  can  be  contradicted  in  any 
state  of  R.  (This  fact  is  insensitive  to  the  choice  of  cover.)  By  lemma  4.5,  only 
two  tuples  from  an  instance  of  Y 2Z  may  agree  on  Y XY 2  as  only  that  scheme 
has  Y\Y2  in  the  closure  of  its  set  of  attributes.  No  tuple  from  Y\Y2Z  may  have  a 


4-96 


constant  to  Jfj,  again  by  lemma  4.5,  wo  the  claim  la  established. . — - - 

If  <YiYt-*Xi,  i.ig|>  appears  in  the  graph  described  in  the  proof  of 

theorem  4.4,  it  must  appear  in  r("r [Jfi]=s[Ari]”)  for  some  r,  s  and  by  the  above 
reasoning  all  of  r,  s,  t\,  t2  come  from  Y^Y2Z  and  agree  on  Y \Y  2.  Thus  application 
of  KXi-^Yu  [r,s]>  has  no  effect  and  may  be  omitted.  But  Xi~*Yi  is  the  only 
dependency  in  G  with  X\  on  the  left.  Thus,  we  have  shown  directly  that 
5,^rr(Rc)=5,i47,»r(R,c-[rlr2^Arli).  • 

4.6.  Other  Approaches 

Other  researchers  have  concerned  themselves  with  the  functions 
represented  in  a  database  state  for  the  functional  dependencies  of  the  schema. 
Both  Ling  and  Tompa  [LT]  and  Arora  and  Carlson  [AC]  define  the  function  fj  for 
the  single  relation  case  to  be  They  are  both  concerned  with  determining  if 

the  function  represented  by  a  multi-relation  state  is  equivalent  to  the  same 
function  in  a  single  relation.  Arora  and  Carlson  consider  only  join  consistent 
states;  that  is,  their  work  makes  the  universal  relation  instance  assumption. 
Ling  and  Tompa  offer  no  model  of  the  database  as  a  whole.  Both  sets  of  authors 
give  methods  of  calculating  a  function  from  its  derivations.  Neither  method  is 
presented  in  the  relational  algebra  and  it  is  not  apparent  that  their  methods  can 
be  converted  to  such  a  presentation.  Ling  and  Tompa  are  particularly  con¬ 
cerned  that  all  such  calculations  produce  the  same  function,  something  we  have 
shown  not  to  be  feasible  even  when  the  database  consists  of  a  single  relation. 
Arora  and  Carlson  specifically  reject  the  method  of  calculation  presented  here 
after  noticing  that  it  results  in  functions  more  defined  than  the  simple  select, 
project  functions.  Significantly,  they  do  not  apply  their  method  of  calculating 
derived  functions  to  their  counter-example  (figure  1  in  [AC]).  It  produces  the 
same  result  as  the  equivalent  -^-function.  This  is  not  surprising,  as  all  sound 
techniques  must  produce  the  same  result,  if  any.  Neither  set  of  authors 


4  -  36 


considers  the  technique  of  lemmas  4.1  and  4.4. 

We  have  argued  that  restricting  the  function  fj  to  be  is  unnatural.  We 

have  demonstrated  that  a  function  is  not  necessarily  calculated  by  its  deriva¬ 
tions.  It  is  not  clear,  however,  how  the  result  of  a  function  should  be  interpreted 
on  a  value  not  "present"  in  the  state.  If  the  functions  in  the  state  model  func¬ 
tions  in  the  world  represented  by  the  state,  then  these  functions  say  something 
about  that  world,  even  at  values  not  recorded  in  the  database.  In  the  single  rela¬ 
tion  case,  the  user  may  wish  to  interpret  the  absence  of  a  value  as  denoting  the 
non-existence  of  some  entity,  relationship  or  whatever.  Thus  if  a  function  is 
defined  at  such  an  absent  value,  it  might  be  interpreted  as  stating  that  should 
such  an  entity,  etc.,  come  into  existence,  certain  of  its  attributes  are  fixed  by 
what  is  already  known.  On  the  other  hand,  if  the  database  is  thought  to  capture 
only  partial  information  about  the  world,  statements  about  existence  in  that 
world  are  less  certain.  As  shown  by  example  4.5,  in  the  multi-relation  case  it  is 
much  less  certain  which  values  are  present  or  absent.  In  any  case,  these  aspects 
of  the  user’s  interpretation  are  not  captured  by  the  theoretical  model  underly¬ 
ing  this  paper.  That  model  attempts  to  derive  statements  which  are  true  for  any 
interpretation  in  which  the  dependencies  are  true. 

The  fact  that  distinct  derivations  of  a  given  dependency  may  be  none¬ 
quivalent  when  interpreted  as  relational  expressions  has  long  been  known.  The 
"uniqueness  assumption"  [B]  requires  that  all  such  derivations  calculate  the 
same  function.  In  the  single  relation  case,  proposition  4. 1  verifies  the  uniqueness 
assumption  at  those  values  present  in  the  instance.  Some  authors,  in  particular 
Sciore  [Sc2],  claim  that  the  presence  of  nonequivalent  derivation  expressions 
indicates  the  "semantic  overloading"  of  some  attributes.  It  is  questionable 
whether  such  an  easy  transition  between  user  semantics  and  the  syntax  of 
dependencies  is  justifiable.  Perhaps  the  user  should  be  given  the  freedom  to 


4  -  37 


require  nonequivalent  derivations  to  be  constrained  to  be  equal  while  allowing 
provably  equivalent  derivations  to  disagree. 

As  a  consequence  of  theorems  4.2  and  4.3,  a  schema  designed  by  either  a 
synthetic  [BDB]  or  de compositional  [Fl]  algorithm  represents  and  is  con¬ 
strained  by  all  the  functional  dependencies  given.  One  may  wonder  whether  any 
practical  benefit  is  to  be  gained  by  closing  the  gap  between  theorems  4.3  and 
4.5.  Although  these  algorithms  guarantee  nice  theoretical  properties,  it  is  not 
certain  that  they  guarantee  "good"  designs  in  practice.  Dependencies  do  not 
capture  ail  the  semantics  inherent  in  the  user’s  interpretation  and  they  com¬ 
pletely  ignore  performance  considerations.  There  is  much  more  to  schema 
design  than  these  algorithms  capture.  It  is  perhaps  more  useful  to  consider 
theoretical  results  such  as  these  as  providing  schema  analysis  rather  than 
design.  Tsichritzis  and  Lochovsky  [TL]  present  a  fuller  account  of  theoretical 
issues  in  this  light.  It  seems  that  it  is  important  to  consider  arbitrary  designs, 
even  though  the  class  of  practically  useful  designs  is  likely  to  be  small,  since 
that  class  has  yet  to  be  identified. 

4.7.  Summary 

In  this  chapter  we  have  investigated  the  interrelationship  of  a  schema,  con¬ 
sidered  as  a  collection  of  subsets  of  the  universe,  and  a  set  of  functional  depen¬ 
dencies.  We  have  studied  two  properties  of  this  interrelationship.  A  functional 
dependency  may  be  interpreted  as  the  description  of  a  function.  We  have  given 
the  conditions  under  which  a  given  schema  represents  a  given  dependency  as  a 
function  and  when  it  represents  all  of  a  given  set  of  dependencies  as  functions. 
Interestingly,  this  last  property  is  enjoyed  by  exactly  the  class  of  dependency 
preserving  schemas  as  defined  by  [BMSU].  This  class  is  strictly  larger  than  the 
class  of  cover  embedding  schemas,  which  class  has  heretofore  been  considered 
the  largest  class  representing  all  of  a  given  set  of  dependencies. 


4  -  38 


We  have  studied  the  calculations  of  the  functions  described  by  the  depen¬ 
dencies.  We  considered  the  derivation  of  a  dependency  as  a  blueprint  for  the 
construction  of  a  relational  algebra  expression.  This  is  in  keeping  with  the 
description  of  Armstrong’s  rules  given  by  Bernstein  [B].  The  expression  pro¬ 
duced  in  this  way  does  not,  we  discovered,  always  calculate  the  function.  In  the 
single  relation  case,  if  the  function  is  defined  at  a  given  value,  then  at  that  value 
it  agrees  with  the  collection  of  its  derivation  expressions.  However,  if  the  func¬ 
tion  is  undefined  due  to  the  non-existence  of  the  requisite  weak  instance,  this 
may  not  be  noticed  by  the  derivation  expressions.  In  the  multi-relation  case,  we 
have  shown  by  example  that  the  derivation  expressions  may  fail  to  return  a 
result  at  a  value  at  which  the  function  is  defined.  In  general,  therefore,  we  have 
shown  that  the  method  of  derivation  expressions  is  incomparable  to  the  method 
of  the  chase. 

We  noted  that  under  certain  circumstances  a  set  of  dependencies  may  allow 
for  infinitely  many  derivations.  This  can  be  ignored  when  the  existence  of  any 
derivation  of  a  given  dependency  is  being  tested,  as  in  [B].  However,  we  have 
shown  that  it  is  possible  for  each  of  an  infinite  set  of  derivations  of  a  given 
dependency  to  correspond  to  a  different  mapping  from  database  states  to  func¬ 
tions. 

Functional  dependencies  are  also  meant  to  act  as  constraints  on  the  states 
of  a  schema.  Although  we  have  not  fully  characterized  this  phenomenon,  we  have 
shown  necessary  and  sufficient  conditions  for  a  given  dependency  to  act  as  a 
constraint  with  respect  to  a  schema  and  set  of  functional  dependencies.  In  par¬ 
ticular,  a  dependency  may  act  as  a  constraint  even  though  the  function  it 
describes  is  empty  in  every  state.  This  property  is  strictly  weaker  than  the  pro¬ 
perty  of  being  represented.  Before  these  investigations,  the  distinct  properties 
of  being  represented  as  a  function  and  acting  as  a  contraint  on  states  which  a 


4-39 


dependency  may  enjoy  with  respect  to  a  schema  had  been  confused  by  other 
researchers,  as  we  have  shown. 


Chapter  5 


Maintenance 


5.1.  Introduction. 

In  chapter  2  we  gave  an  algorithm  which  decided  whether  a  given  database 
state  satisfies  a  given  set  of  dependencies.  During  the  operation  of  a  database 
management  system,  the  database  will  be  modified.  Assuming  the  database  is  in 
a  consistent  state,  the  system  needs  to  maintain  its  consistency  in  the  presence 

of  modifications.  In  short,  the  system  is  faced  with  the  maintenance  problem. 

As  in  chapter  4,  we  will  continue  to  restrict  ourselves  to  functional  depen¬ 
dencies.  In  this  chapter  we  will  be  concerned  with  the  following  question. 

Given  a  satisfying  state  p  and  a  tuple  t  to  be  inserted  in  p(/?t),  is  the  result 
of  this  insertion  a  satisfying  state? 

It  is  straightforward  to  verify  that  the  deletion  of  any  tuple  from  a  satisfying 
state  results  in  a  satisfying  state.  The  in-place  modification  of  the  values  in  some 
tuple  may  be  simulated  as  a  deletion  followed  by  an  insertion.  Solving  the 
maintenance  problem  for  insertion  is  therefore  sufficient  for  solving  it  in  gen¬ 
eral. 

5.2.  A  worst  case  lower  bound. 

An  algorithm  for  solving  tha  maintenance  problem  may  taka  advantage  of 
tha  fact  that  tha  database  is  satisfying,  Wa  might  suspect  that  utilization  of  thil 

knowledge  would  allow  the  algorithm  to  reach  a  decision  more  quickly.  We  will 
showr  in  this  section  that  this  is  not  the  case. 

It  is  a  simple  matter  to  exhibit  a  worst  case  lower  bound  for  the  satisfaction 
problem.  Suppose  a  state  p  is  constrained  by  a  set  of  dependencies  F. 

Proposition  5. 1 .  The  satisfaction  problem  for  p  under  F  can  be  decided  in 


5-2 


time  mm  better  thee  0(|p|x|^|).  (where  |p|*  £  jp(JI)|*  the  number  mt  tuplee 

lel 

in  p) 

Proof.  To  determine  satisfaction,  it  is  necessary  to  examine  each  row  of  the 
tableau  for  the  database,  Tp ,  against  each  dependency.  ■ 

Recall  from  chapter  4  that  the  congruence  closure  algorithm  of  Downey  et. 
al.  will  decide  satisfaction  in  time  Odplxj/1 1  xlog(|p|  x|,F  |).  So  the  complexity 
of  the  satisfaction  problem  is  well  understood. 

To  show  that  the  maintenance  problem  has  the  lower  bound  given  in  propo¬ 
sition  5.1.  we  show  that  certain  pathological  states  may  sometimes  be  con¬ 
structed. 

Lemma  5. 1 .  There  exist  a  schema  R  and  a  set  of  functional  dependencies  F, 
such  that  for  any  m> 2,  a  state  p  of  R  with  lp!=m  exists  such  that  every  proper 
substate  of  p  is  satisfying  with  respect  to  F  but  p  is  not  satisfying. 

Proof.  Let  R -\AB,  BC,  AC],  F-\A-*C,  B^C).  Note  that  every  state  of  R  of 
size  less  than  2  is  satisfying.  For  the  purpose  of  the  following  construction, 
assume  without  loss  of  generality,  dom(A)=dom(B  )=fi,  the  set  of  natural 
numbers. 

We  consider  the  cases  m  odd  and  m  even  separately.  For  m= 2,  choose 
p(AB  )=p(BC)=<f>,  p{AC)  =  \<  11>,  <12>|.  For  larger  m,  assume  m-Zk  and  choose 

p  as  given  by  the  following  diagram: 

A  B  __B _ C 

11  11 

12  k  2 

2  2 

Jfc-l  k- 1 
fc-1  k 

and  p{AC)-$.  Thus  p(AB )  has  Z(k  —  1)  tuples  where  the  iih  tuple  is  < 
i  i 

odd,  <— ,  —  + 1>  for  i  even. 


>  for  i 


5-3 


Let  f  he  A  -*C,  g  be  3  -»C.  If  p  were  satisfying,  then  for  each  l&j<fc,  we  must 
have  g0(j)*fp(j)s9p(J  +  l)  »inc«  both  <>,>>  and  <j,j+l>  appear  in  p(AB).  There¬ 
fore  <7p(l)=<7p(fc)  by  transitivity.  But  this  equality  Is  violated  in  p(BCY,  thus,  p  is 
not  satisfying. 

Any  proper  substate  cr  of  p  will  either  have  a  proper  subset  of  p(BC),  so  that 
P<r(l)=fl,ff(fc )  is  not  violated,  or  a  proper  subset  of  p(AB),  so  that  the  conclusion 
is  not  valid. 

For  m=3  choose  the  state 

A _ B_  _A _ C_  _B _ C 

11  11  12 

For  771=2^  +  1,  (k>  1),  we  choose  p  as  follows 

A _ B_  A  C 

11  11 

12  k  2 

2  2 

•  • 

k  —  1  k 

k  k 

and  p(J9C  )=0.  Thus  \p{AB  )  t  =2(fc -  l)+l  and  |p|=2fc  +  l.  The  argument  that  p  is 
as  required  is  similar  to  the  above.  ■ 

As  a  result  of  lemma  5.1,  we  have  the  worst  case  lower  bound  on  the  mainte¬ 
nance  problem. 

Theorem  5. 1 .  For  arbitrary  schemes  R  and  sets  of  fd’s  F,  the  maintenance 

problem  is  of  complexity  Q(  !p  !  x \F  | ). 

Proof.  Consider  state  p  as  in  the  lemma.  Upon  insertion  of  the  final  tuple 
into  p,  an  algorithm  must  examine  every  tuple  or  it  will  not  catch  the  error. 
Furthermore  the  algorithm  assumes  only  that  the  state  without  the  to-be- 
inserted  tuple  is  satisfying.  Therefore  it  must  use  every  dependency  in  F  or  it 
may  not  find  the  error.  ■ 

Theorem  5.1  applies  to  algorithms  which  assume  only  that  the  unmodified 
state  is  satisfying.  It  does  not  apply  to  incremental  algorithms  which  may  store 


5-4 


more  information. 

Theorem  5.1  shows  that  in  general  maintenance  requires  an  amount  of 
resources  determined  by  the  size  of  the  database  as  a  whole.  Databases  whose 
sizes  are  measured  in  millions  are  not  uncommon.  Unless  a  modification  is 
extremely  rare,  we  must  find  a  means  of  maintaining  satisfaction  which  is 
independent  of  database  size.*  The  results  of  this  section  show  that  this  will 
require  restricting  the  form  of  the  schemas  and  dependency  sets.  Such  is  the 
goal  of  the  next  section. 

5.3.  Independence. 

5.3.1.  Introduction,  Sagiv  Independence 

In  this  section,  by  constraining  the  form  of  the  schema  and  dependency  set, 
we  reduce  the  satisfaction  problem  for  the  entire  database  state  to  the  satisfac¬ 
tion  problem  for  each  instance  in  the  state  considered  separately.  We  claim 
that  systems  so  restricted  have  constant  resource  maintenance  algorithms 
(constant  in  the  size  of  the  database  state,  that  is). 

Let  r=p(R)  be  a  single  relation  instance  and  let  F-\X\-*A  . . Xk ->Ak }  be  a 

non-redundant  cover  of  the  dependencies  embedded  in  R.  Assume  r  satisfies  F. 
Let  1  be  a  tuple  to  be  inserted  into  r.  To  determine  if  rljfi]  satisfies  F  it  suffices 
to  retrieve  from  r  the  set  T  —  \xl  |  u£r  <tu[A’i]=l  [Jfj  for  some  ISiSfc].  In  fact 
|  T  1  ^k  as  at  most  1  tuple  need  appear  in  T  for  each  X^.  Thus  the  worst  case  com¬ 
plexity  for  maintaining  instances  of  R  is  no  better  than  linear  in  the  number  of 
dependencies  in  the  smallest,  cover  of  F.  The  actual  complexity  of  retrieving  T 
depends  upon  the  data  structures  which  support  retrieval  from  r.  Since  we  do 
not  wish  to  discuss  the  complexity  of  data  structures,  we  assume  T  can  be 


^That  is  if  we  are  concerned  only  with  worst  case  as  opposed  to  average  case  complexity.  The 
construction  of  lemma  5.1  produces  a  state  very  unlikely  in  practice. 


5  -  5 


retrieved  in  time  proportional  to  its  size.  We  will  judge  the  time  consumed  by  a 
maintenance  algorithm  to  be  the  number  of  tuples  it  requests  from  the  state. 
Thus  we  claim  the  maintenance  of  a  single  relation  can  be  done  in  time  propor¬ 
tional  to  the  smallest  cover  of  the  embedded  dependencies,  which  is  indepen¬ 
dent  of  I p\. 

We  call  a  schema  in  which  for  every  state,  the  fact  that  each  instance  of  the 

state  is  satisfying  with  respect  to  its  embedded  dependencies  Implies  that  the 
state  is  satisfying  with  respect  to  all  the  dependencies,  an  i ndspmdttti  schema. 

We  give  a  precise  definition. 

Recall  from  chapter  4  the  definition,  for  a  teed  schema  I  and  a  dependency 

Mi  r 

SATW(F)=[p  i  p  a  state  of  H  satisfying  F  j 
Let  F+\RX  be  the  subset  of  F+  embedded  in  Rx.  Define  the  set  of  locally  satisfy¬ 
ing  states  of  R  by 

LSA  T(F )- [p  |  for  each p(Rx)  satisfies  F*  j  Rx\ 

Definition.  A  schema  R  is  independent  with  respect  to  a  set  of  functional 
dependencies  F,  if 

SATW(F)=LSAT(F) 

Of  course  SATW (F)QLSAT (F)  is  a  tautology;  so  proving  independence  requires 
establishing  the  reverse  inclusion. 

In  [Sag],  Sagiv  characterized  a  subclass  of  the  independent  schemas.  These 
are  those  independent  schemas  which  are  constructed  by  the  synthesis  algo¬ 
rithm  of  Biskup  [BDB],  itself  a  variant  on  the  synthesis  algorithm  of  Bernstein. 

A  schema  produced  by  this  algorithm  has  the  property  of  embodying  a 
non-redundant  cover  of  the  set  of  dependencies  which  was  input  to  the  algo¬ 
rithm.  A  dependency  is  said  to  be  embodied  in  a  relation  scheme  if  its  left  hand 
side  is  a  key  of  the  relation.  A  subset  X  of  a  relation  scheme  R  is  a  key  of  R  if 


ft-t 


X-+R  holds  and  for  every  proper  subset  X'cX,  X'/R.  A  relation  may  have  more 
than  one  key.  A  dependency  whose  left  hand  side  is  a  key  of  the  relation  in 
which  it  is  embodied  is  called  a  key  dependency.  Let  Kt  be  the  set  of  key  depen¬ 
dencies  of  Rf.  The  satisfying  states  of  a  schema  produced  by  the  Blskup  algo¬ 
rithm  is  the  set  SA  TW(\jXi). 

i 

Sagiv  gives  the  following  characterization  of  independent  schemas  produoed 
by  Diskup's  algorithm. 

Ve  say  that  a  relation  scheme  Rj  can  odd  an  attribute  A  to  Ri  if  there  is  a 

key  K  of  Rj  such  that  A  ftK  and  RjQR*.  The  relation  schemes  in  K  satisfy  the 
■waifwiiees  amditUm  tff  far  every 

a)  no  relation  scheme  (other  than  /?<)  can  add  to  an  attribute  already 
in  Rit  and 

b)  if  A  G-Rf  -Ri,  then  there  is  a  unique  that  can  add  A  to  , 

Theorem  5.2.  If  R  satisfies  the  uniqueness  condition,  then  it  is  independent. 
[Sag] - 

We  can  show  that  this  condition  is  not  necessary  for  independence.  Con¬ 
sider  the  dependencies  \X  ^A,  A  -*B,  B  for  which  Biskup’s  algorithm  gen¬ 
erates  R=[AA,  AB  j  with  X  the  key  of  XA  and  both  of  A  and  B  keys  of  AB.  Since  A 
is  not  an  element  of  B,  this  schema  violates  property  a)  of  the  uniqueness  condi¬ 
tion.  However  the  reader  should  take  the  time  to  convince  himself  that  it  is 
independent. 

We  will  be  concerned  with  characterizing  independence  in  the  general  case. 
Although  the  Biskup  algorithm  guarantees  certain  desirable  properties  for  the 
schemas  it  generates,  we  do  not  believe  that  all  ’‘correct"  schemas  necessarily 
have  these  properties.  On  the  other  hand,  solving  the  general  problem,  in  which 
R  and  F  are  each  arbitrarily  chosen,  is  more  than  is  required  in  practice,  as  R 


5-7 


and  F  are  indeed  not  chosen  arbitrarily.  However,  we  feel  that  the  general  solu¬ 
tion  gives  us  insight  into  the  nature  of  independence. 

5.3.2.  Independence  in  the  general  case. 

We  assume  R  and  F  are  chosen  arbitrarily.  We  will  describe  properties  of  R 
and  F  which  we  will  show  are  necessary  for  R  to  be  independent  with  respect  to 
F.  We  will  give  conjunctions  of  these  properties  which  are  sufficient. 

ft4JLl.  Weak  m#mr 

Recall  that 

F*\Ri  =  \f  |  fzF*  and  /  is  embedded  in  /?*} 

Let 

Pr^rrty  f.  SATW(r)mSArW(r*\M). 

Lemma  5.2.  Property  1  is  necessary  for  R  to  be  Independent  with  respect 

to  F. 

Proof.  F\-F+  |B  implies  SATW(F)C.SATW(F4'\R).  Since 

LSAT(F)2SATW(F*\  R),  an  element  of  SAT W(F*\'R) ~ SAT W(F)  is  a  counterexam¬ 
ple  to  the  independence  of  R;  that  is,  an  element  of  LSAT(F)—SATW(F).  • 

We  call  property  1  weak  cover  embedding.  A  scheme  which  is  cover  embed¬ 
ding  in  the  standard  sense  is  cover  embedding  in  the  sense  of  property  1.  Recall 
the  following  example  from  chapter  4.  Let  R=jA£,  BC\ ,  F  =  \A-*C].  As  we  argued 
previously,  R  is  unconstrained  by  F.  That  is,  SATW (F)=SATW (0).  Thus  weak 
cover  embedding  is  strictly  weaker  than  standard  or  strong  cover  embedding. 

For  any  state  of  a  weakly  cover  embedding  schema,  it  is  sufficient  to  chase 
the  tableau  of  the  state  with  respect  to  the  embedded  dependencies  only.  By  a 
result  due  to  Beeri  and  Honeyman  [BH],  finding  F^jR  is  probably  difficult  when 


5  -  0 


F*  i  R =F  is  false;  i.e..  when  R  is  not  strongly  cover  embedding. 

There  is  currently  no  known  test  for  weak  cover  embedding.  In  spite  of  this, 
we  will  assume  from  now  on  that  we  have  not  only  ^1+[R,  but  even  more  strongly 
each  of  the 

5.3.2. 2.  Consistency. 

To  develop  our  next  property,  we  reconsider  the  derivation  expressions  of 
chapter  4.*  Those  expressions  took  an  instance  of  the  universal  relation  as  a  vari¬ 
able.  We  wish  to  redefine  these  expressions  to  take  a  state  of  the  database  in 
place  of  such  a  universal  instance.  We  do  this  by  replacing  the  references  to  the 
instance  I  in  such  an  expression  by  appropriate  references  to  p(Ri). 

Let  Dt  be  a  derivation  tree  for X -*B  whose  expression  6(y(Dt))  is  as  defined 
in  chapter  4.  We  define  for  a  schema  R,  a  transformation  cr  and  we  call 
Cji(6(y(Dt ))  the  strong  derivation  expression  for  X-+B  in  R. 

Let  6{y(Dt))  be 

(orts  *rk(+t*  ’ 

where  k,HO,  Ft  if  present  is  of  the  form  and  ^  if  present  is  4(7iPti)) 

where  Dt±  is  the  i**  non-trivial  subtree  of  Dt.  Suppose  that  the  root  dependency 
of  Dt,  Le.,  the  dependency  which  was  used  to  attach  the  immediate  descendants 
of  the  root  of  Dt,  Is  embedded  in  the  scheme  Rj  of  R.  Then  otid{"/{Dt )))  is 

*b(pfx±  ‘  •eR(V'i)VCfly))) 

If  Dt  is  trivial,  then  en(6(y(Dt)))  =  d(-y(Dt))=-/(Dt)  =  l,  a  formal  variable.  As  before, 

the  strong  expression  for  f:X-+B  generated  by  Dt  is 

Example  5.1.  Let  F  =  \X ->Y,  YW-*Z\.  Let  R=\XY,  YWZ].  The  tree 


X 


3-» 


generates  ihs  expression 

ni(cnr— ((wr(o3r-.(p(jry))))*p(nrz))) 

As  in  chapter  4,  we  can  remove  selection  altogether  by  using  singleton  rela¬ 
tions,  rewriting  the  expression  as 

TTZ(l<w>]  *  ((ttk0 <x>\*  p{XY)))*  p{YWZ))) 
which  form  is  more  conducive  to  proofs  by  induction.  ■ 

Two  features  of  this  definition  must  be  mentioned.  First,  we  recognize  that  a 
given  fd  may  be  embedded  in  more  than  one  scheme  of  R.  Thus  ^7  is  not 
uniquely  given  by  Dt.  The  scheme  actually  chosen  in  an  expression  is  said  to  be 
utilized  for  the  fd  by  the  expression.  We  will  allow  every  choice  of  an  embedding 
scheme  for  a  dependency  in  Dt.  A  given  tree  may  therefore  produce  a  multitude 
of  strong  derivation  expressions. 

Secondly,  it  may  be  that  some  fd  used  to  construct  Dt  may  be  embedded  in 
no  scheme  of  R  There  will  then  be  no  strong  expression  for  Dt.  As  a  consequence 
of  lemma  5.2,  we  need  consider  only  derivations  based  on  some  cover  of  the 
embedded  dependencies  if  the  schema  is  independent.  Such  a  cover  which  is 
itself  embedded  can  adways  be  found.  The  non-existence  of  strong  expressions 
for  some  derivation  trees  is  irrelevant,  therefore. 

As  before,  we  say  that  is  defined  at  a  value  x,  if  (V'/(p))(x)  is  other 

than  the  empty  set.  Note  also  that  the  following  property  carries  oven  if 
(*/(p)X*)  i*  empty,  it  is  a  singleton  set  whenever  p€LSAT(f). 

Let  If,  4f  he  two  distinct  strong  expression!  for  the  same  M.  We  say  thet 

V'/,  iff  are  consistent  within  LSAT(F)  (or  just  consistent),  if  for  all  states 
P&LSAT (F)  and  all  values  x  such  that  both  Tpf{p)  and  i?/(p)  are  defined  at  x, 
(V'/(p))(x)  =  (^/(p))(a;). 

Property  2.  Let  f:R-*A  eF+  where  R  is  a  scheme  of  R  We  specifically 
include  the  possibility  A  €.R  so  that  f  may  be  trivial.  All  pairs  of  expressions  Y'y , 


5  -  10 


Vj  for  /  are  consistent. 

A  schema  which  has  property  2  will  also  be  called  consistent.  To  determine 
the  consistency  of  the  schema,  we  must  test  all  pairs  of  expressions  for  each 
dependency  of  the  form  given  in  the  definition.  As  we  remarked  previously,  there 
may  be  infinitely  many  such  pairs.  Our  first  task  is  to  demonstrate  that  only 

finitely  many  of  them  need  be  tested. 

Recall  that  a  derivation  tree  is  called  bounded  if  no  attribute  repeats  on 
any  root  to  leaf  path.  Let  ^  be  an  expression  generated  by  a  non-bounded  tree. 
Dt.  Within  Dt,  let  nlt  na  be  nodes  on  some  root  to  leaf  path  labelled  by  the  same 
attribute.  Assume,  wlog.  that  fi\  Is  higher,  l.e..  closer  to  the  root,  than  Let 
f%l  be  the  subexpression  for  f  for  the  tree  rooted  at  nj;  W*8  for  the  tree  rooted 
at  nj.  Let  be  the  expression  formed  by  replacing  with  in  Note 

that  the  pairs  end  f,  ere  each  expressions  for  the  same  depen¬ 

dency.  by  augmentation. 

Lemma  5.3.  If  yf/n t  is  consistent  with  Wn#.  then  "p  is  consistent  with  W*»,/na. 
indeed  i'niynz(p)^i'(p)  for  all  peLSAT(F). 

Proof.  Assume  Tpni.  ipn  are  consistent.  If  ip  is  defined  in  a  state  p  at  a  value 
x,  then  ipn  and  Y'n,  are  defined  and  (V'n.G0))^  )-(i/n,(p))(x)-  Consequently. 

I  2  1  d 

(V'n1/n2(p))(^)  =  (V'(p))(^)-  - 

We  will  say  that  a  derivation  tree  is  1  -bounded  if  no  attribute  other  than 
that  labelling  the  root  repeats  on  any  root  to  leaf  path  and  the  root  attribute 
labels  no  more  than  one  other  node  on  any  root  to  leaf  path. 

Lemma  5.4.  A  schema  R  is  consistent  if  and  only  if  for  each  dependency 
R -*A  with  R  a  scheme  of  R,  all  pairs  of  expressions  generated  by  1-bounded 
derivation  trees  for  R  ->A  are  consistent. 


5  -  11 


Proof.  The  necessity  is  immediate.  For  the  sufficiency  proof,  any  non-1- 
bounded  tree  may  be  reduced  to  a  1-bounded  tree  by  iteration  of  lemma  5.3.  As 
the  resulting  expression  contains  the  original,  if  it  is  consistent  with  a  second 
expression,  the  original  is  consistent  with  it  as  well.  ■ 

We  remark  that  it  is  not  legitimate  to  continue  to  apply  lemma  5.3  so  as  to 
reduce  the  problem  to  the  study  of  height  bounded  trees.  We  must  know  that  a 
1-bounded  tree  is  consistent  with  a  height  bounded  tree  to  make  the  reduction 
of  lemma  5.3.  By  bounding  the  height  of  the  derivation  trees  we  must  consider, 
we  need  test  only  finitely  many  expression  pairs  for  consistency. 

Next  we  show  that  without  loss  of  generality  we  may  further  restrict  the 
class  of  expressions  we  must  examine.  Recall  that  strong  derivation  expressions 

utilize  only  embedded  dependencies.  We  will  show  that  we  need  consider  only 

derivations  and  expressions  for  a  dependency  R-*A  based  on  a  non-redundant 

cover  of  the  set  tj  P+  I  Pi-  (This  is  not  necessarily  the  set  F+  \  j  R,  the  set  of 

R^R 

dependencies  not  embedded  in  R.)  Suppose  R  is  utilized  for  some  dependency  of 
the  tree.  If  A  is  a  non-leaf  attribute  of  this  fd,  i.e.,  not  a  leaf  of  the  tree,  the  sub¬ 
tree  rooted  at  this  node  is,  by  augmentation  possibly,  a  tree  for  the  trivial 
dependency  If  this  expression  is  consistent  with  the  expression  for  the 

empty  derivation,  then  by  lemma  5.3,  we  may  remove  it  from  the  tree.  If  the 
expression  for  the  result  is  consistent  with  a  given  expression,  so  is  the  original. 
If  the  new  expression  is  not  consistent  with  a  given  expression,  then  the  schema 
is  not  consistent  and  the  test  for  the  original  expression  is  irrelevant. 

If  all  the  dependencies  in  the  expression  utilize  R,  we  will  have  reduced  the 
tree  to  the  empty  derivation  of  the  trivial  dependency.  Consistency  ignores 
derivations  for  R  ->A  wholly  contained  in  R.  The  universal  relation  scheme  is 
therefore  a  consistent  schema  for  any  set  of  fd's.  This  is  reassuring  since  it  is 
also  independent. 


5  -  12 


In  short,  we  have  reduced  the  problem  of  testing  for  consistency  as  given  in 
the  following  lemma. 

Lemma  5.5.  A  schema  is  consistent  if  and  only  if  for  each  dependency  R  -*A 
all  pairs  of  expressions  which 

i)  are  1-bounded 

ii )  do  not  utilize  R  for  any  dependency 
are  consistent.  • 

The  test  for  consistency  of  a  pair  of  expressions  is  tableau  based.  The 
tagged  tableau  for  a  derivation  expression  can  be  constructed  directly  from  the 
derivation  tree: 

1)  the  summary  row  has  a  distinguished  variable  (dv)  in  the  column  of  the 
attribute  labelling  the  root  of  the  tree; 

2)  for  every  use  of  an  fd  in  the  derivation  tree  there  is  a  row  of  the  tableau 
whose  values  are  set  according  to  the  procedure: 

2.1 )  for  a  node  of  the  fd  (considered  as  a  building  block  of  the  tree)  which  is 
a  leaf  of  the  tree,  that  column  has  the  formal  constant  'O’; 

2.2)  a  node  of  the  fd  which  is  the  root  of  the  tree  has  a  dv; 

2.3)  otherwise  the  node  is  used  by  some  other  fd  of  the  tree;  the  value  in 
the  two  rows  in  this  column  is  the  same  non-distinguished  variable 
(ndv)  appearing  nowhere  else  in  the  tableau; 

2.4)  the  row’s  tag  is  the  relation  scheme  whieh  is  utilizer]  by  the  expression 
for  the  fd. 

3)  Symbols  not  specifically  set  by  the  above  procedures  are  uniquely  appear¬ 
ing  ndvs. 

We  call  the  0’s  appearing  in  this  tableau  formal  constants  since  they  behave 
as  constants  and  as  distinguished  variables  in  different  settings.  When 


6  -  IS 


constructing  the  tableau  from  the  axpreisicsi  vta  the  procedure  of  [ASU],  the 

formal  constants  must  be  treated  as  constants,  since  they  appear  in  select  for¬ 
mulae.  When  evaluating  the  tableau  in  a  state,  ire  will  interpret  the  formal  con¬ 
stants  as  variables.  For  the  test  to  be  described,  they  will  again  be  treated  as 
constants. 

For  the  empty  derivation  of  a  trivial  dependency,  whose  derivation  tree  is 
the  trivial  one  and  uses  no  fd’s,  these  three  steps  produce  an  empty  tableau. 
This  will  not  be  acceptable  and  therefore  we  include  an  additional  row  according 
to  the  following  rule. 

4)  A  row  with  all  formal  constants  hi  tho  attributes  of  tha  loll  hand  aid#  of  tho 

dependency,  the  scheme  R,  with  tag  R. 

The  addition  of  this  row  changes  the  expression  corresponding  to  the  tableau.  If 
the  original  expression  was  \p.\r.  if/,  the  new  expression  is 

Xp.Xr.if/*  i\^(uR=r{p(R ))) 

This  expression  will  be  defined  for  a  given  R  -value  r  only  if  p{R  )  contains  a  tuple 
t-r.  When  such  a  tuple  does  exist,  the  new  expression  is  equal  to  the  original 
This  can  be  proven  from  the  definitons  of  the  relational  algebra  operators.  As 
the  identity  is  a  containment  mapping  from  the  tableau  without  the  additional 
row  to  the  tableau  with  it,  we  have  again  the  new  expression  is  contained  In  the 
original. 

We  will  show  momentarily  that  this  modification  loses  no  generality.  The  row 
created  by  this  fourth  step  demonstrates  that  we  are  treating  augmentation 
differently  in  this  chapter  than  we  have  previously. 

IT  the  original  expression  corresponded  to  the  empty  derivation  of  a  trivial 

dependency,  the  row  produced  by  step  4  is  the  only  row.  In  this  case  the  original 
expression  was  of  the  form  \p.\r.a  where  a  is  the  formal  variable  for  the  attri- 


5  -  14 


bute  A.  This  expression  is  total.  The  new  expression  is 

*p.\r.nA((TR=r(p(R))) 

which  is  as  in  the  prior  case  contained  in  the  original  expression.  The  summary 
row  of  the  tableau  for  this  expression  contains  a  constant  rather  than  a  dis¬ 
tinguished  variable.  We  therefore  continue  rule  4. 

4)  contd.  If  this  is  the  only  row  of  the  tableau,  change  the  distinguished  vari¬ 
able  in  the  summary  row  to  the  formal  constant,  0. 

As  mentioned,  the  tableau  constructed  by  rules  1-4  corresponds  to  an 
expression  contained  in  an  expression  for  R-*A.  We  argued  that  the  expression 

for  the  tableau  is  the  original  expression  with  its  domain  restricted  to  values  in 
p(R)  for  a  given  p.  We  now  show  that  testing  such  expressions  for  consistency  is 
equivalent  to  testing  the  original  expressions. 

Lemma  5.6.  Two  expressions  satisfying  the  restraints  of  lemma  5.5  for  a 
dependency  R -*A  are  consistent  in  LSAT(F)  If  and  only  if  for  every  p€LSAT(F) 
they  are  consistent  on  the  values  of  p{R ). 

Proof.  Again  the  necessity  is  immediate.  Suppose  ip,  are  inconsistent 
expressions  which  are  consistent  on  the  values  of  p(R )  for  some  p€iLSAT(F). 
That  is  for  all  values  t€p(R)  when  both  expressions  are  defined  for  t  in  p, 
(^(p))(0=C*(p))(0-  Yet  there  exists  some  value  u^p(R)  such  that  both  are 
defined  and  distinct.  The  state  p'  formed  by  discarding  p(R )  and  setting 
is  locally  satisfying.  Since  neither  ip  nor  t?  utilize  R  for  any  fd,  both 
are  defined  and  equal  in  p’  at  u  to  their  values  in  p  at  u.  Thus  they  are  incon¬ 
sistent  in  p  at  a  value  in  p'(R  ).  « 

Lemma  5.6  does  not  reduce  the  number  of  expressions  whose  pairwise  con¬ 
sistency  must  be  tested  but  it  does  allow  us  to  lest  the  reduced  expressions 
corresponding  to  the  tableau  of  rules  1-4.  We  may  now  present  the  test  for  con¬ 
sistency  of  two  strong  derivation  expressions. 


5  -  15 


(Consistency  test.)  Let  V'*  be  two  expressions  for  the  same  dependency 
whose  left  hand  side  is  a  schema  of  R  Let  T+,  T+  be  the  tableaux  constructed  by 
rules  1-4.  If  necessary,  rename  the  variables  in  Tf.  T «  such  that  they  have  only 

the  constants  in  common.  Union  the  rows  of  the  tableaux,  discarding  the  sum¬ 
maries.  Now  apply  the  Local  chase  to  the  result.  This  requires  that  every 
transformation  <X -+A,  \r.  s]>  satisfy 

r[7ag]  =  s[7a<7] 

XA  Qr[Tag  ] 

The  expressions  ip,  tS  are  consistent  in  LSAT(F)  if!  the  symbols  in  the  summary 
rows  of  Tf,  T$  are  equated  in  the  result.  ■ 

Before  we  prove  the  correctness  of  the  consistency  test,  we  point  out  the 
principal  feature  of  the  local  chase. 

Lemma  5.7.  The  local  chase  preserves  equivalence  of  expressions  over  the 
set  of  locally  satisfying  database  states,  i.e.,  over  LSAT(F). 

Proof.  The  proof  is  a  simplification  of  the  proofs  in  chapter  3  and  is  omitted. 
Note  that  the  local  chase  does  not  change  the  tableau  for  an  element  of 
LSAT(F).  ■ 

We  say  two  expressions  pass  or  fail  the  consistency  test  according  to 
whether  that  test  judges  them  consistent  or  not. 

Lemma  5.8.  Expressions  Tp,  are  consistent,  if  and  only  if  they  pass  the 
consistency  test. 

Proof.  (Only  if)  Suppose  the  expressions  fail.  Let  T+  be  the  tableau  result¬ 
ing  from  the  test.  Let  a  be  the  symbol  in  the  summary  row  of  T  a'  the  symbol 
in  the  summary  row  of  T$.  These  symbols  have  not  been  equated  in  7’+.  For 
notational  convenience,  assume  they  have  not  been  changed,  either. 

We  can  form  a  locally  satisfying  state  from  7,+  in  a  straightforward  way.  Let 
<p  be  any  injection  from  the  symbols  of  T +  to  the  domains  of  R.  The  state  p  is 


5  -  16 


given  by 


p(R  )  =  {  <p(r)[R  ]  |  r€T+  &  r[Tag  ]=R  J 

It  is  clear  that  (V'(/J))(v?(c5))=p(a )  and  (tf (yo))(p(5))=p(a ').  Since  a*a'  and  <p  is  an 
injection,  p(a)*p(a')  demonstrating  the  inconsistency  of  and 

(If)  Suppose  the  expressions  pass.  Let  T  be  the  tableau  formed  at  the 

beginning  of  the  consistency  test  and  T+  the  result  of  the  local  chase  applied  to 
T. 

Suppose  a  value  r  is  in  the  domain  of  both  if/(p)  and  tf(p)  for  some 
peLSAT(F).  This  requires  there  exist  valuations,  u,  77  from  T+  and  T 4  respec¬ 
tively  such  thattT'(5)=7f(5)=r.  We  claim  that  the  mapping  £  from  T  to  Tp  defined 
by 


?(0= 


“(O  ittzTf 
rj{t )  if  teT# 

is  a  valuation  function.  This  is  immediate  from  the  fact  that  the  only  symbols 


shared  by  Tf,  T 4  are  by  assumption  the  formed  constants.  Thus  £  is  functionaL 


By  lemma  5.7,  £  is  a  valuation  function  from  T+  to  T p  since  p  is  locally  satis¬ 
fying.  We  can  calculate,  if  a  is  the  summary  symbol  of  T a’  the  summary  sym¬ 
bol  of  T#t 


v(a)=t(a)=$(a')=rj(a') 
since  a  =  a’  in  T*.  Thus  (^(p))(r  )=(tf  (p))(r).  » 

The  simplest,  most  compelling  proof  of  the  necessity  of  consistency  for 
independence  recognizes  that,  when  the  consistency  test  fails,  it  produces  a 
counter  example  to  independence. 

Theorem  5.3.  If  a  schema  R  is  independent,  then  it  is  consistent. 

Proof.  If  R  is  not  consistent,  two  expressions  fail  the  consistency  test.  The 

tableau  T+  which  demonstrates  this  inconsistency  is,  or  may  be  mapped  by  an 
^The  formal  constants  are  here  being  treated  as  variables. 


6  -  1? 


injection  onto,  the  tableau  Tp  for  a  locally  satisfying  state  p.  This  state  is  surely 
not  satisfying  as  the  two  expressions  can  each  become  transformation 

sequences  of  the  general  (as  opposed  to  the  local)  chase.  (See  chapter  4.)  Thus  R 
is  not  independent.  ■ 

Example  5.2.  Consider  the  system  given  by 
R  =\ABC.  ABCDEF] 

F  =  \A  -*F,  AB  -*D,  BD  ->F,  AC-+E,  CE->F] 

We  present  two  trees  for  the  dependency  ABC  -*F 


Dti  DtZ 

&>£>?  c>|>f 


whose  tableaux  are 


Dtx 


A  B  C  D  E  F 

Tag 

a{ 

summary 

0  0  0 

0  0  6, 

0  6 1  a , 

ABC 

ABCDEF 

ABCDEF 

Dt  2 


A  B  C  D  E  F 

Tag 

az 

summary 

0  0  0 

0  0  bg 

0  b  2  a  2 

ABC 

ABCDEF 

ABCDEF 

The  combined  tableau  for  these  trees  is 


Nbr 

0 

1 

2 

3 

4 


A  B  C  D  E  F 

Tag 

0  0  0 

0  0  b  i 

0  b  {  a ! 

0  0  6  2 

0  b  g  CL  2 

ABC 

ABCDEF 

ABCDEF 

ABCDEF 

ABCDEF 

^Ve  apply  the  local  chase  sequence 


ft-  it 


<BD  -*F,  \1,2]> 

<CE  -*F,  [3,  4|> 

<A  -*F,  [1,  3j> 

which  sets  aj=ci2  demonstrating  the  consistency  of  the  expressions  gen¬ 
erated  by  the  trees.  Observe  the  role  played  by  the  dependency  A  -+F  not 
used  in  either  tree.  Thus  consistency  may  not  always  be  demonstrated  by 
local  tree  manipulation  procedures.  Observe  as  well,  that  the  expressions 
for  both  trees  are  contained  in  the  expression  for  the  tree  using  only  A  -*F.  • 

Example  5.3.  We  now  present  an  example  of  a  non-empty  derivation  of  a 
trivial  dependency  which  is  consistent  with  the  empty  derivation.  Let 

R=|A0.  BC.  ACD  j 
F-\B  -»C,  AC-*D,  D^A\ 

The  non-empty  derivation  of  AB  -*A  is 


wfaoM  tableau  is 

Nbr  . . . 

0 
1 
2 
3 

The  local  chase  is  simply 

<D->A,  \2,  3]> 

which  sets  ai=0  as  required.  ■ 

We  may  generalize  example  5.3  so  as  to  prove  a  necessary  condition  govern¬ 
ing  consistency  of  trivial  dependencies.  The  root  relation  of  an  expression  is  the 
relation  utilized  by  the  expression  for  the  root  dependency  of  the  tree  generat- 


A  B  C  D 

Te* 

0  0 

0  b  i 

0  b  i  b  2 

a.  i  b  2 

AB 

BC 

ACD 

ACD 

5  -  19 


ing  the  expression. 

Lemma  5.9.  Let  R  be  a  consistent  schema.  For  some  R  £ R,  let  R  ->A  be  a 
trivial  dependency  for  which  a  non-empty  expression  exists  not  utilizing  R  for 
any  dependency.  Let  X  be  the  set  of  labels  on  the  leaves  of  the  tree  generating 
this  expression.  Then 

i)  X-+A  is  a  trivial  dependency,  i. e.,  A  £X, 

ii )  some  dependency  for  which  the  root  relation  is  utilized  by  the  expression 
has  A  as  an  attribute  of  its  left  hand  side.  Further,  if  the  tree  is  1-bounded, 
this  node  is  a  leaf  of  the  derivation  tree. 

We  delay  the  proof  of  lemma  5.9  to  derive  a  helpful  technical  lemma  about 
the  local  chase. 

Lemma  5.10.  Let  T  be  any  tagged  tableau.  Let  the  R  -tableau  of  T  be  the 
subset  of  the  rows  of  T  having  tag  R.  Suppose  for  some  attribute  A  no  symbol  in 
the  A  -column  of  the  R  -tableau  appears  in  the  A  -column  of  any  5  -tableau  of  T 
for  R*S.  If  T*  is  the  result  of  the  local  chase  of  T,  then  no  symbol  in  the 
A  —column  of  the  R  —tableau  of  T *  appears  in  the  A  -column  of  any  S  -tableau  of 
T+. 

Proof.  Let  <X-*A,  [u,v]>  be  the  first  transformation  applied  in  the  calcula¬ 
tion  of  7n+  which  set  a  symbol  t  [.<4  ]  for  t  in  the  R  -tableau  equal  to  a  symbol  w  [A  ] 
for  w  in  the  S' -tableau  for  some  S*R.  We  must  have,  possibly  after  renaming, 
t\A]=u[A\,  v  [A  ]=u;  [A  ]  before  execution  of  this  transformation.  The  rows  u,  v 
are  by  the  nature  of  the  local  chase  both  elements  of  the  Q  -tableau  for  some  Q. 
If  R*Q,  by  t[A~\-u\_A]  this  is  not  the  first  transformation  with  the  intended 
effect.  Similarly  if  R  =  Q  *S,  by  v[A  ]  =  w[A  ].  Finally.  R  -Q  =S  implies  this  transfor¬ 
mation  does  not  have  the  intended  effect.  ■ 


Proof  of  lemma  5.9.  i )  Since  R  is  consistent,  the  expression  for  R  ->A  must 


5  -  20 


be  consistent  with  the  empty  derivation.  The  test  for  this  consistency  operates 
on  the  tableau  of  the  non-empty  expression  for  R-+A.  The  summary  symbol  of 
this  tableau  is  a  distinguished  variable,  not  a  constant.  If  A  j?X,  then  the  formal 
constant  0  appears  only  in  the  R -tableau  in  column  A.  By  lemma  5.10,  this 
remains  true  after  the  local  chase  has  completed.  The  distinguished  symbol  of 
the  tableau  therefore  does  not  become  0,  contrary  to  the  assumption  that  the 
non-empty  derivation  expression  is  consistent  with  the  empty  one. 

ii)  Let  S’  be  the  root  relation.  If  no  symbol  of  the  A -column  of  the 
S’ -tableau  appears  in  any  Q  -tableau,  then  the  distinguished  symbol  fails  to 
become  0,  as  in  part  i.  Since  the  distinguished  variable  appears  in  exactly  one 
row  of  the  tableau,  some  other  symbol  of  the  A -column  of  the  S— tableau 
repeats  in  a  Q -tableau.  Suppose  this  symbol  is  a  formal  constant.  Then  we  are 
done.  Otherwise,  it  is  a  repeated  ndv  and  A  must  appear  in  some  dependency  of 
the  tree  other  than  the  root  depedency  for  which  S'  is  utilized.  If  ^4  is  a  left  hand 
attribute  of  that  dependency,  we  note,  by  part  i,  that  the  tree  is  not  1-bounded. 
Otherwise,  if  >1  appears  on  the  right,  we  can  find  within  the  given  tree,  a  smaller 
tree  for  the  dependency  R -*A  with  S'  as  the  root  relation.  The  proof  may  then  be 
completed  by  induction.  ■ 

It  should  be  clear  on  intuitive  grounds  that  a  schema  which  embeds  a  given 
functional  dependency  in  two  or  more  schemes  will  not  be  independent.  The  two 
schemes  are  free  in  LSAT(F)  to  assign  conflicting  functions  to  the  dependen¬ 
cies.  Consistent  schemes  disallow  this  behaviour  and  require  further  that  the 
embedded  dependencies  are  embedded  non-redundantly. 

Proposition  5.2.  Let  R  be  a  consistent  scheme.  Then 
i)  i*j  implies  F*  .  Ri  p  F+ \Rj  contains  only  trivial  dependencies. 


ft  -81 


H)  If  for  each  i,  C4  la  a  non-redundant  carer  of  F*\Rit  then  jjC4  le  m.  nan* 

i 

redundant  cover  of  F*\ R. 

Proof,  i)  Suppoee  containe  X-+A  which  is  not  trrrieL  The 

trivial  dependency  R+-+A  haa  the  empty  derivation  and  the  non-empty  derivation 
through  X -*A  by  augmentation.  X  -*A  is  not  trivial  by  assumption.  So  H  Is  not  con* 
sistent  by  lemma  5.9. 

vij  \jCi  is  certainly  a  cover  of  /**|B  so  we  need  only  prove  it  is  non* 
« 

redundant.  Assume  f  is  redundant  in  ljC<.  There  is  a  bounded  derivation  tree  of 

4 

f  based  on  yC4-  \f\.  If  / €Cj,  then  this  tree  must  contain  some  dependency  in 

t 

ljC4.  We  can  find  within  this  tree,  a  subtree  based  on  \jCi  with  leaf  attributes  X 

and  root  A  such  thatX4£/?;.  But  X-*A  is  not  trivial  by  the  height  boundedness  of 
the  original  tree.  So  the  expression  for  Rj-*A  generated  by  this  tree  is  not  con¬ 
sistent  with  the  empty  expression,  by  lemma  5.9.  * 

By  part  i  of  proposition  5.2,  we  no  longer  need  consider  which  relation  is 
utilized  by  an  expression  for  a  given  fd,  as  there  is  only  one  choice  in  a  con¬ 
sistent  scheme. 

We  can  demonstrate  some  prerequisites  for  a  pair  of  expressions  for  non¬ 
trivial  dependencies  to  be  consistent.  Recall  that  the  relation  utilized  by  an 
expression  for  the  root  dependency  of  the  tree  which  generates  the  expression, 
is  called  the  root  relation  of  the  expression.  Consider  the  components  (the  maxi¬ 
mal  connected  subgraphs)  of  the  subgraph  formed  from  t.he  derivation  tree 
after  removal  of  all  edges  generated  by  fd’s  not  in  the  root  relation.  Each  com¬ 
ponent  is  a  derivation  tree  for  some  dependency  embedded  in  the  root  relation. 
The  frontier  of  the  expression  is  the  set  of  attributes  labelling  the  leaves  of  the 
component  containing  the  root  of  the  derivation  tree.  For  the  trees  in  example 


5  -  22 


5.2,  AB  is  the  frontier  of  Dt\,  AC  is  the  frontier  of  Dt 2.  AC  is  the  frontier  of  the 
tree  in  example  5.3. 

Proposition  5.3.  Let  R  be  consistent.  Let  4  be  expressions  for  the  non¬ 
trivial  dependency  R  -*A,  for  R  €R  Then 

i)  The  root  relations  of  ip,  are  the  same; 

ii)  If  Li  is  the  frontier  of  ip,  Lz  of  t?,  then  L  z~+A\ 

Hi)  Let  M !  is  the  set  of  attributes  labelling  the  leaves  of  the  tree  generating  ip 
and  Mz  the  leaves  of  .  If  neither  ip  nor  utilize  R  for  any  dependency,  then 
M  i  p|Af2-»i4. 

Proof,  i)  Let  51  be  the  root  relation  of  ip.  If  any  symbol  in  column  A  of  the 

S' -tableau  of  T ^  appears  in  the  Q  -tableau  in  column  A,  then  consider  the 

expression  generated  by  the  subtree  rooted  at  the  lower  node.  Thus  we  may 
assume  that  no  symbol  in  the  .4 -column  of  the  root  tableau  of  either  T  +  or  T a 
repeats.  If  the  root  relations  of  ip,  are  not  the  same,  then  ip,  fail  the  con¬ 
sistency  test,  by  lemma  5.10. 

ii)  Assume  ip,  iS  do  not  have  this  property.  We  will  construct  directly  an  ele¬ 
ment  p  of  LSAT(F)  which  demonstrates  their  inconsistency.  By  i,  we  know  the 
root  relations  of  ip,  i?  are  the  same.  Construct  p  such  that  for  5’  not  the  root 
relation  p(S)  contains  a  single  tuple.  For  convenience,  let  each  of  these  tuples 
be  a  vector  of  all  0's.  Construct  a  two  tuple  instance  of  the  root  relation  with 

1)  a  tuple  of  all  0’s  except  on  Lz  ~  (L  i  H-L 2)^  which  is  all  l’s, 

2)  a  tuple  of  all  l’s  except  on  LzKj(L  \  which  is  all  0’s. 

pCLSAT (F)  since  the  two  tuples  agree  on  exactly  (L 1  (~)Lz)+-  The  first  tuple  is  0 
on  L  1,  the  second  on  L2-  Assume  that,  for  each  attribute  B  £L  1  (respectively  Lz). 
the  subexpression  of  ip  (resp.  t?)  rooted  at  B  returns  0  at  5.  Then  ip  (resp.  i))  is 
defined  at  6  in  p.  Since,  by  assumption,  A  &(L\C\L  2)*,  it  must  be  that 


5  -  23 


(V'(p))(dMtf(/o))(6). 

We  are  left  with  the  task  of  proving  that  the  subexpressions  described  above 
return  0  at  0.  Observe  that  this  is  the  case  if  the  graphs  formed  from  each 
derivation  tree  after  removal  of  all  edges  generated  by  fd’s  not  in  the  root  rela¬ 
tion  have  each  but  a  single  non-trivial  component,  whose  leaf  attributes  are,  by 
definition  L\  and  Lz •  (A  trivial  component  is  an  isolated  vertex.)  Assume  there¬ 
fore  that  one  or  both  of  these  reduced  graphs  have  more  than  one  non-trivial 
component.  We  will  call  the  non-trivial  components  not  containing  the  root 
dangeroiLS  trees*.  We  proceed  by  induction  on  the  total  number  of  such  trees  in 
both  reduced  graphs.  Wc  have  already  established  the  basis  of  this  induction. 
Before  continuing,  we  pause  to  assume  without  loss  of  generality  that  none  of 
the  attributes  labelling  the  local  roots  are  elements  of  the  frontiers  L  i  nor  Lz 
nor  of  the  leaf  attributes  of  any  of  the  dangerous  trees.  If  B  were  such  an  attri¬ 
bute,  then  we  must  have  B  €R  to  avoid  a  contradiction  of  part  i.  We  may  there¬ 
fore  remove  the  subtree  of  the  original  derivation  tree  rooted  at  this  local  root 
and  produce  a  tree  with  at  least  one  fewer  dangerous  trees.  This  tree  has  the 
same  frontier  as  the  original  and  the  induction  applies  to  prove  the  proposition. 

Let  there  be  k  dangerous  trees  in  the  graphs  and  let  the  dependency 

derived  by  the  Xth  be  W^Bi.  Observe  that  the  induction  hypothesis  applies  to 
•ny  pair  of  derivation*  for  R-*C  whose  frontier*  are  subsets  of  any  pair  W ^  Wy 
the  case  i-j  included. 


Our  method  is  to  form  a  tableau  for  the  root  relation  containing  the  two 
rows  already  generated  plus  k  additional  rows,  the  iiri  row  having  0’s  in  the  Wi 
columns  and  distinct  variables  everywhere  else.  Number  the  rows  of  this  tableau 


as  follows:  1 1  is  the  first  row,  with  0’s  in  L  i~(L  iC'LzY  (overbarring  denotes  com¬ 
plement  within  the  root  relation);  is  the  second  row,  with  0‘s  in  Lz\j(L  j  r\L  a)+; 


^Since  they  endanger  the  proof 


5  -  24 


farj>t  - I - i ith rr mi  Iru 

Wt  ohtM  this  tableiu  in  tbt  oornud  w vy.  Assuming  this  doss  Mt  reveal  • 

contradiction,  we  assign  a  unique  constant,  not  0  or  1,  to  each  variable  remain* 
ing  in  the  tableau-  The  result  is  a  satisfying  instance  of  the  root  relation.  The 
value  of  Bi  in  ti+z  may  not  be  0.  Change  the  value  of  Bi  where  ever  it  appears 
outside  the  root  relation  to  ti+2[Bi].  If  for  twj  we  have  Bi=Bjt  then  Wj -+Bi  by 

induction  and  fi+a[5<]=f/4.a[5<]*  We  have  already  established  that  no  Bi  is  an  ele¬ 
ment  of  1 1.  L*  nor  any  of  the  Wg* s.  Thus  every  element  of  each  L%  mnd  L%  Is  0 
•utrtde  the  root  rolntton.  Therefore  oeeh  niwiipeneni  ef  the  retyped  graph  sfl 
he  evalueted  hi  the  row  senetmeted  hr  U.  Therefore  ft  #  ere  deteed  hi  iUi 
state  and  are  unequal,  as  required. 

It  remains  only  to  demonstrate  that  we  may  find  such  a  satifying  state.  Ve 
first  prove  that  for  any  row  tj+%,  />©,  and  any  attribute  f)€(LtfyL§)*.  If 
tj+z[D  ]=£*[/)  ].  for  i- 1  or  2,  at  any  point  during  the  chase,  then  there  existi 
KjClfy  with  Vj-*D  and  We  prove  this  by  induction  on  the  length  of 

the  transformation  sequence  setting  tj+2[D  ]=£*[£)  ].  The  basis  is  the  case  of  a  sin¬ 
gle  transformation,  which  must  be  <X->D,  [t3+2,ti]>  withJfClfy.  For  the  induction, 
if  the  transformation  setting  £J+2[£  ]=£t[Z)  ]  is  <Y ~*D,  [ts+2,tu+2]>  for  some  s,  u, 
then  one  of  \ts+2,  wlog  fs+z.  has  ts+z[B  ]=fi[£  ]  and  by  induction  there  exists 

VS£WS,  VS-*D  and  fs+2[l/5]=ft[ Vs].  Observe  that  W]->D r.  By  the  outer  induction, 
V8f^\Wj-*D.  Therefore  setting  Vj-^j  (~}VS>  we  have  fj+2[^3]=S=<s+2[Vr;-]=fi[Ky]  as 
required. 

Now  we  can  show  that  the  chase  we  outlined  above  proceeds  without  con¬ 
tradiction.  Assume  the  transformation  <X-*D,\t,s]>  is  a  contradiction  in  this 
chase;  i.e.,  r[A']=s[Ar]  and  r[23],  s[D]  are  distinct  constants.  Neither  of  r,  s  is 

either  of  t  x,  t2.  It  must  be  that  D  c{L  \C\Lz)*  and.  possibly  after  renaming, 
fcf.  chapter  4. 


6-26 


T[D'\=tx[D'\  and  s[D]=t^p  ].  Therefore  there  are  Vr _g,  V,_2  as  required  by  the 
above  claim.  Let  y=Kr_2nV,_ 2-  We  have  F-^Z)  by  induction  and  YQ(L \C\L2)* 
since 

*iM=fr[r]=f,[r]=f8[r] 

and  1 1,  <2  agree  only  on  {L\f^Lz)*.  So  D  e(L\f}L2)*  a  contradiction* 

xxxj  We  present  a  construction  dual  to  that  of  part  xx.  Assume,  in  contradic¬ 
tion  of  the  lemma,  that  M 

Form  a  two  tuple  relation  of  the  universal  instance  for  R  which  tuples  agree 
on  exactly  For  S*R,  let  p(S)  be  the  projection  onto  S  of  these  two 

tuples.  Let  p(R)  be  the  single  tuple  whose  Afj  value  is  the  projection  of  the  first 
and  whose  M 2  value  is  the  projection  of  the  second  of  these  universal  tuples.  The 
remainder  of  the  values  in  this  tuple  are  irrelevant. 

V'Cp)  is  defined  at  the  value  of  p(R)  through  the  projections  of  the  first 
tuple;  t3(p)  through  the  projections  of  the  second.  They  return  distinct  results 
by  construction.  ■ 

5.3. 2.3.  Additional  properties  of  Independent  Schemas 

The  central  open  question  of  this  chapter  is  whether  or  not  weak  cover 
embedding  and  consistency  are  together  sufficient  for  independence.  The  next 
property  we  introduce  is  too  vulgar  to  be  given  a  name.  The  conjunction  of  this 
property  and  properties  1  and  2  is  sufficient  for  independence.  This  will  prove 
uninteresting  in  itself  but  useful  for  a  later  result. 

Define  the  total  projection  operator  n*  to  be  the  projection  operator 
modified  to  discard  tuples  with  variables.  If  T  is  any  tableau 

ntx(T)  =  [u[X ]  j  ueT  &  u[X]  contains  no  variables} 


^Hopefully  the  dual  use  of  the  word  'contradiction'  is  not  confusing. 


5  -  28 


Property  3.  Let  X-*A  be  embedded  in  R.  For  any  state  p€.LSAT(F)  and 
transformation  sequence  *,  ntXA(x(Tp))CnXA(p(R)). 

The  vulgarity  of  property  3  is  apparent.  It  is  difficult  to  describe  all  possible 
transformation  sequences  on  a  given  tableau,  let  alone  all  sequences  over  all 

possible  states  of  a  schema.  The  necessity  of  this  property  is  easy  to  establish. 

Lemma  5.1 1 .  Any  independent  schema  enjoys  property  3. 

Proof.  Otherwise  for  any  value 

xa  €tt^  (x{Tp))-rXA  (p(R)) 

insert  a  tuple  t  into  p(R)  with  t[X]-x,  t[A]*a.  The  result  is  in 

LSA T (F)-SA TW (F ).  - 

Proposition  5.4.  A  consistent,  weakly  cover  embedding  schema  having  pro¬ 
perty  3  is  independent. 

Proof.  Let  p  be  a  locally  satisfying  state  of  a  consistent,  weakly  cover 
embedding  schema  having  property  3.  By  induction  on  ipi,  the  number  of  tuples 
in  p,  we  prove  that  the  first  contradiction  cannot  happen. 

Basis  is  trivial.  For  the  induction,  choose  any  tuple  from  any  non-empty 
relation,  say  p(Ri),  and  chase  the  rest  By  the  hypothesis,  no  contradiction 
arises.  By  property  3,  we  may  find  a  weak  instance  for  this  subset  of  p  such  that, 
when  the  remaining  tuple  (augmented  with  ndv’s)  is  added  to  that  weak 
instance,  no  contradicted  transformation  is  immediately  enabled.  We  aure  now- 
calculating  functions  within  a  single  instance  as  in  chapter  4.  If  any  contradic¬ 
tion  occurs,  it  must  involve  the  added  row,  t.  If  J[A]  is  a  constant  for  A  £[/-Ri, 
then  some  expression  for  Ri~>A  is  defined,  cf.  chapter  4.  Aii  such  expressions 
return  the  same  result,  by  consistency.  Therefore  the  first  contradiction  cannot 
arise  on  an  element  of  U-Ri. 


Assume  the  first  contradiction  is  on  an  attribute  A  £Ri-  By  induction  on  the 


5  -  27 


number  of  transformations  before  the  contradiction  (by  the  choice  of  weak 
instance  this  is  not  the  first  transformation)  and  the  fact  that  this  is  the  first 
contradiction,  the  contradicted  transformation  must  be  of  the  form 
<X-*A,  [t,u]>  and  t[X]  is  the  result  of  some  derivation  expression  for  R^X.  Thus 
the  derivation  expression  built  from  these  and  X ->A  is  inconsistent  with  the 
empty  derivation  expression  formed  by  augmenting  A  ->A.  • 

In  the  proof  of  this  proposition,  property  3  plays  a  minor,  although  vital, 
role.  The  proof  is  more  interesting  than  the  proposition,  as  property  3  can  be 
made  to  play  a  much  more  important  role. 

Theorem  5.4.  A  weakly  cover  embedding  scheme  having  property  3  is 
independent. 

Proof.  Assume  <X -*A,  |7\sj>  is  a  contradiction.  It  can’t  be  that  r[XA]  is  all 
constant  for  then  r[XA  ]£ttxa  (pC^  )).  R  a  scheme  embedding  XA>  and  r[A]*s[A] 
implies  s[XA  Jjz't: (p(/?  ))  violating  property  3. 

It  is  possible  to  supply  constants  for  the  variables  of  r,  forming  r’  in  such  a 
way  that  may  be  inserted  into  p(/?»)  for  each  with  the  result 

p'eLSAT(F).  Since  p’2p,  there  is  a  x  such  that  in  x(^p’)  v* [A' ] = s  [A" ]  and 
r[XA  ]€TTjC4(p’(/? ))  by  the  fact  that  all  dependencies  used  in  the  chase  are 
embedded.  But  still  r[.4]*s[A  ]  and  property  3  is  violated  as  before.  » 

The  next  property  we  introduce  will  prove  interesting  in  its  own  right.  We 
name  it  extendibility. 

Property  4.  For  all  states  p,  all  dependencies  f:R  -*A  ( R  a  relation  scheme), 
and  values  v,  if  fp(v)  is  defined  then  f P{v)-{^p{p)){y)  for  some  strong  derivation 
expression  ip. 

If  a  scheme  has  this  property,  the  value  of  any  function  whose  left  hand  side 
is  a  relation  scheme  or  subset  thereof  can  be  calculated  in  the  state,  without 


5  -  28 


recourse  to  the  tableau.  Hence  the  name. 

Lemma  5. 12.  Extendibility  is  necessary  for  independence. 

Proof.  Assume  R  is  not  extendible.  For  p  a  state  of  R  pf?LSAT{F)  implies 
p&SATW(F)  and  fp  is  everywhere  undefined.  Therefore  let  peLSAT(F)  and 
assume  fp(v)  is  defined  but  not  equal  to  (V'C p))(v )  for  any  if/.  Therefore  R-*A  is 
not  trivial  and  A&R. 

Lst  denote  a  tuple  with  iv[/?]arv  and  be  ndv*s  not  appearing  in 

rr  If  fp(v)  la  defined  then  there  Is  a  transformation  sequence  x  on  T  , 
setting  i*[A  ]*/#(v).  Within  x  there  ie  a  subsequenoe  %  of  transformations  eaoh 
ef  wtoftoti  Mvofrr—  i,  and  whose  M's  ferm  a  beta esM.  derivation  tree  Mr  Let 
f  be  the  expression  fhr  this  derivation.  We  say  that  a  tranefemetlea 
<X -»U,  \ty.u]>  is  total  if  u[X0]  contains  no  variables.  If  all  the  transformations 
of  X  are  total,  then  property  3  is  violated,  as  otherwise  (f{p))(v  )w0. 

Let  |£  1*1  Ve  examine  £  so  as  to  form  a  sequence  p\,  ...  ,pi  of  states  in 
LSAT(F)  with  p<3p  and  defined  and  distinct  from  /#(v ).  A  oontradlo- 

tion  will  therefore  be  uncovered  in  the  chase  of  showing  that  R  is  not 

Within  that  v*  Is  the  J^-dowK»d«*  d  f,  «  q  presides  vy. 

Tjm<JC }*»,«*$>»  JCfeCX 

Let  tj*<A  -+B,  f  f*,%u  be  the  transformation  of  £  where  A  *A  t  •  •  •  A*  end 

assume  the  preceding  transformations  have  been  processed  producing  a  state 
pm- 1  as  required.  (Let  p0=p.)  Let  J  Q\  1,  .  .  .  ,k]  be  such  that  j ej  if  examination 
of  the  A^-descendant  of  77,  if  it  exists,  caused  a  tuple  to  be  inserted  in  some 
relation.  Note  that  if  m  =  l  or  77  is  enabled  in  Tp  then  J  =0  and  that  if  u;[Ai]  is 
variable  then  i€.J.  If  J  =  0  then  if  u>[B]  is  constant,  pm=Pm- i-  Otherwise,  if  J =0 
and  w  [B]  a  variable,  a  tuple  t  with  t\A]=u>[A\  £[5]  not  appearing  in  pm-\  is 


5  -  29 


inserted  into  any  relation  embedding  AJB  to  form  pm.  (The  first  non-total 
transformation  of  £  satisfies  this  case.)  If  then  for  j cj  let  a j  be  the  A^-value 
in  the  tuple  of  pm-i  created  during  the  processing  of  the  ^-descendant  of  tj. 
Let  d  =  <a1**’  ak>  where  ai=u;[i4i]  for  ijtJ,  a i=aj,  for  iej.  If  not  already 
present,  insert  a  tuple  t  with  t  [A  ]=c£  and  t  [B  ]  a  value  not  appearing  in  p  into  the 
relation  embedding  AB  to  form  pm. 

pm€LSAT(F)  since  t  [A  {pm-\(R ))  for  any  R^AB.  For  the  case  J -(f),  in 
which  case  w[A  ]  is  total  but  u>[5  ]  is  variable,  this  is  a  consequence  of  x'a  being 
a  transformation  sequence  on  Tp^jty,.  The  case  is  by  construction.  Since 
there  is  at  least  one  non-total  transformation,  pt3p.  ■ 

For  fp(v)  to  be  defined  there  must  exist  a  weak  instance  for  p  with  the 

value  v  appearing  in  some  tuple.  Property  4a  removes  this  restriction. 

! 

Property  4a.  For  all  states  peLSAT  (F),  for  any  row  t€.Tp,  if  the  symbol  t[A\ 
becomes  the  constant  a  in  x(Tp)  for  any  sequence  x  which  finds  no  contradiction, 
then  there  is  some  derivation  expression  ^  for  Rt~*A  such  that  (V'(p)X*  [/?*])=a. 

Property  4a  is  vulgar  in  the  sense  that  property  3  is  vulgar.  It  is  conjec¬ 
tured  that  property  4a  is  enjoyed  by  exactly  the  extendible  schemas.  Note  that 
extendibility  is  a  special  case  of  property  4a. 

Theorem  5.5.  A  consistent,  weakly  cover  embedding  scheme  having  pro¬ 
perty  4a  is  independent. 

Proof.  We  will  show  such  a  scheme  has  property  3.  If  R  has  property  3  with 
respect  to  LSAT(F)  then  we  are  done,  so  assume  the  existence  of  a  state 
peLSAT (F)  which  demonstrates  the  violation  of  property  3.  In  particular  let 
X-+A  be  a  functional  dependency,  R  the  unique  scheme  with  XA  QR  and  let 
xa  ^^’xa  (x(Tp))  for  x  a  transformation  sequence  and  xa&rxA  (p(-R  ))•  For  some 
row  t£PxA=xa(x(Tp))  let  v  =  t[Rt]  be  its  original,  constant  value  in  Tp.  Then 


6-30 


f.Bt^AcF*  and  by  property  4a  there  is  some  expression  i>j  frith  (V'/(p))(v)=o. 
Similarly  there  are  expressions  i/0i  such  that  (V,Pi(p))(v)=x<  for  each  z<€ac.  The 

expression  ‘^/-7Ti4((*(^.(p))(v))*p(/f ))  is  strongly  equivalent  to  an  expression 

for  Rt-+A  which  is  distinct  from  V'/-  However,  ^f.  iff  are  not  consistent:  Insert 
into  p(R  ),  if  not  already  present,  a  tuple  u  with  u[X]=x  and  u[A  ]*a.  ■ 

Property  4a  is  not  as  strong  as  property  3.  In  particular,  the  conjunction  of 
weak  cover  embedding  and  property  4a  is  not  sufficient  for  independence  as  is 
witnessed  by 

E=\AB,BC,AC J 
F  =  [A->C,  B  ->C  j 

which  is  not  consistent.  The  system. 

H=[AB.BDK,C] 

F  ~  [A -*C,  B-*C,  CD^E] 

is  consistent,  extendible  and  has  property  3  but  is  not  weakly  cover  embedding 
and  is  not  independent. 


Chapter  6 

Summary  and  Future  Work 

6.1.  Summary 

We  have,  as  we  said  we  would,  extended  the  theory  of  databases  to  include 
multi-relation  states  not  necessarily  the  projections  of  any  single  instance.  We 
have  found  that  certain  things  change  considerably  as  a  result.  We  discovered 
that  the  standard  notion  of  satisfaction  is,  in  fact,  the  conjunction  of  two  weaker 
notions,  which  we  called  weak  satisfacton  and  completeness.  We  saw  that  func¬ 
tional  dependencies  play  two  distinct  roles,  that  of  function  and  that  of  con¬ 
straint,  which  in  the  world  of  single  relations  are  inseparable.  We  did  show,  in 
chapter  3,  that  the  results  about  expression  equivalence  due  to  Aho  et  al.  are 
substantially  unchanged  in  the  new  setting.  It  is  instructive  to  contrast  our 
other  results  with  the  exposition  of  database  theory  given  in  [BBG],  the  first 
prominent  work  to  call  for  a  weakening  of  the  assumption  of  join  consistency. 

Two  desirable  properties  of  schema  designs  are  presented  in  [BBG].  One  is 
cover  embedding:  the  embedded  dependencies  of  a  cover  embedding  schema 
are  Armstrong-equivalent  to  the  entire  set  of  user  supplied  dependencies.  We 
have  shown  that  this  property  is  stronger  than  necessary  in  two  different  set¬ 
tings.  For  a  schema  to  represent  all  of  a  set  of  dependencies,  it  need  only 
preserve  them  in  the  sense  of  [BMSU];  for  a  schema  to  be  independent,  it  must 
be  weakly  cover  embedding.  However,  suppose  we  wish  a  schema  which 
represents  all  the  dependencies  and  is  independent  with  respect  to  them.  Such 
a  schema  must  be  cover  embedding,  as  the  reader  may  wish  to  verify. 

The  second  property  of  [BBG]  is  losslessness:  the  projection-join  mapping 
given  by  the  schema  is  the  identity  on  satisfying  universal  instances.  W'e  have 
not  presented  any  justification  for  this  property.  However,  the  interested  reader 
is  referred  to  [Mend],  which  answers  the  question:  Can  a  given  schema  represent 


0  -  2 


all  the  states  of  another,  given  schema?  The  key  feature  of  the  answer  involves 
lossless  decompositions.  Therefore,  suppose  the  user  wishes  to  store  at  least  the 
information  in  any  satisfying  universal  instance,  as  well  perhaps  as  information 
which  can  not  be  so  stored.  The  schema  he  designs  should  be  lossless.  Therefore 
it  will  be  dependency  preserving  and  will  represent  the  entire  set  of  user  sup¬ 
plied  dependencies.  It  will  be  necessary  to  use  all  of  any  non-redundant  cover 
when  determining  the  consistency  of  any  database  state  via  the  techniques  of 
chapter  2.  Should  the  user  wish  to  ensure  that  the  individual  relation  instances 
in  every  state  be  separately  updatable  in  the  sense  of  chapter  5,  he  will  have  to 
take  pains  to  embed  a  cover  of  his  dependencies  in  the  relation  schemes  of  the 
schemas.  So  the  conditions  of  [BBG]  are  relevant.  They  are  not,  as  was  shown  in 
chapter  5,  enough.  The  assumption  of  join  consistency  is  quite  strong. 

6.2.  Future  Work 

Here  we  present  a  list  of  open  questions  prompted  by  the  investigations  of 
this  thesis. 

Q1 .  Suppose  that  the  language  L  'r  is  the  language  Lr  of  chapter  2  without  a 
universal  predicate  letter.  For  an  arbitrarily  chosen  set  of  dependencies,  is 
it  possible  to  construct  a  set  of  axioms  B  such  that  for  every  state  p,  B p  is 
consistent  if  and  only  iiAp  is  consistent? 

In  the  place  of  the  containing  instance  axioms  ofZ/R,  it  is  possible  to  construct  in 
L' r  axioms  which  assert  that  the  model,  as  opposed  to  the  state,  is  join  con¬ 
sistent.  However,  what  of  dependencies  which  constrain  the  states  but  are 
embedded  nowhere.  Can  they  be  expressed  in  L  r?  Can  Q  1  be  answered  ‘yes*  if  R 
is  weakly  cover  embedding? 

Q2.  In  arbitrary  instances,  can  we  decide  the  question  'Does  a  given  functional 
dependency  act  as  a  constraint  in  a  given  context?”  If  so,  in  what  amount  of 


6-3 


time  and  apace? 

Consider  the  system,  R=\XA,  B  \Bz],  F-[X-*B  \B  z,B  \-*A,B2-*A],  Note  that  for 
every  f€.F,  SATW  (R,F)=-SATW  (R.F  —  {f  J).  However  it  is  not  the  case  that 
SATW (RF)=SATIV(R,  0).  Suppose  that  we  use  the  symbol  1=r  to  mean:  F  \=%f  iff 
every  state  of  R  satisfying  with  respect  to  F  is  satisfying  with  respect  to  /.  Then 
from  F  Eitf  and  F  we  may  not  conclude  F  ^=r/  A  g.  This  makes  the  develop¬ 
ment  of  inference  rules  for  |=r  problematical. 

Q  3.  A  test  for  weak  cover  embedding  is  needed.  The  conjectured  characteriza¬ 
tion  of  independent  schemas  should  be  proven  or  disproven.  Also,  the 
extension  to  other  kinds  of  dependencies  should  be  attempted. 

Consider  the  set  of  functional  dependencies  [AB  ->C,C  -*X,X -*B\.  Upon  inspec¬ 
tion,  it  would  appear  that  the  only  independent  schema  constrained  by  exactly 
this  set  of  dependencies  is  the  universal  scheme.  This  is  not  even  in  2NF. 
Independence  may  prove  in  practice  to  be  too  restrictive  a  requirement.  A  more 
useful  approach  is  an  algorithm  which  performs  as  follows: 

Suppose  p  is  a  state  known  to  satisfy  a  given  set  of  constraints.  Let  t  be  a 
tuple  to  be  inserted.  The  algorithm  retrieves  from  p\j\t\  a  subset  cr  such  that 

i)  a  is  satisfying  iff  p'\j\t  \  is  satisfying. 

ii)  for  all  satisfying  condition  i,  |  cr j  ^  j-tf ; 

Such  an  algorithm  would  consume  no  more  resources  than  is  essential  for  the 
maintenance  of  the  given  state,  p. 


Appendix 

Definitions 


A.l.  Basics 

The  universe  is  a  finite  collection  of  attributes.  It  is  conventional  to  use 
letters  from  the  beginning  of  the  alphabet,  A,  B,  A  j,  A2  •  *  •  to  denote  single 
attributes,  letters  towards  the  end  of  the  alphabet  X,  Y,  X\,  Xf  •  •  to  denote 
sets  of  attributes.  The  union  operator  is  normally  elided:  thus,  XY  =X  \jY.  The 
distinction  between  the  single  attribute  A  and  the  set  is  frequently  ignored. 
The  letter  U  ordinarily  denotes  the  universe. 

For  each  attribute  A  Z.U,  there  is  a  collection  of  values  called  the  domain  of 
A  or  dom(A).  Within  this  thesis,  dom(A)  will  always  be  countably  infinite.  Within 
‘real’  databases,  the  size  of  the  potential  value  set  is  limited  by  storage  con¬ 
siderations.  In  most  cases  it  is  tacitly  assumed  that  the  pool  of  potential  values 
is  never  exhausted.  It  is  always  possible  to  hire  a  new  employee  or  open  a  new 
account.  There  are  admittedly  cases  in  which  the  infinite  domain  assumption  is 
not  realistic:  sex,  marital  status,  military  rank,  etc.  The  effects  of  finite  domains 
are  not  treated  in  this  thesis. 

A  relation  scheme  is  a  subset  of  the  universe.  The  letters  R,  S,  R  lt  /?2'  •  ■ 

are  used  to  denote  relation  schemes.  An  instance  r  of  a  scheme  AI=[Ai . An] 

is  an  object  which  may  be  defined  in  either  of  two  ways. 

i)  rQdom(A  i)x  •  •  •  xdom^Ak) .  That  is,  an  instance  is  a  subset  of  the  cross  pro¬ 
duct  of  the  underlying  domains. 

ix)  Let  $  be  a  collection  of  functions 


<  ,  .  n 


* 

A  t/? 


IL'J //(. 


(A 


/  *  \  —  i 

ycuu//i  \/i  /) 


Then  rC<p. 


A-  2 


In  either  case,  the  elements  of  an  instance  are  called  tuples,  which  name  is 
motivated  by  the  first  definition.  We  call  the  first  definition  the  cross  product  or 
column  number  definition  of  a  relation  instance;  the  second  we  call  the  attribute 
definition.  The  difference  is  the  effect  of  ordering  on  the  attributes  By  the 
column  number  definition,  relation  schemes  are  more  properly  ordered  tuples 
than  unordered  sets  of  attributes.  For  most  purposes,  the  choice  of  definition  is 
irrelevant;  we  may  assign  a  fixed  ordering  to  the  attributes  to  convert  them  to 
column  numbers;  we  may  assign  attributes  to  column  numbers  to  convert  the 
other  way.  We  will  feel  free  to  choose  the  definition  appropriate  to  a  particular 
discussion  as  needed. 

Tuples  are  normally  displayed  according  to  the  column  number  definition. 
So  the  tuples  <Marc,  Brenda>,  <Dennis,  Vicki>,  Alberto,  Maria>  may  appear  in 
a  Marriage  relation  instance  on  the  attributes  <Husband,  Wife>.  The  instance  as 
a  whole  is  normally  displayed  in  a  tabular  format  motivated  by  the  attribute 
definition,  as 


Husband 

Wife 

Marc 

Brenda 

Dennis 

Vicki 

Alberto 

Maria 

A  database  schema  is  a  collection  of  relation  schemes.  Bold  letters  R,  S  are 
reserved  for  database  schemas.  A  database  state  is  an  assignment  of  a  relation 
instance  to  each  relation  scheme  of  a  database  schema.  Conventionally,  lower 
case  Greek  letters  from  the  middle  of  the  alphabet  denote  database  states.  Thus 
for  p  a  state,  /?eR,  p(R)  is  the  instance  of  the  scheme  R  assigned  in  the  state  p. 
We  will  often  speak  of  a  state  p  as  though  it  were  the  set  of  instances 
\p(R)  !  R  gR|  rather  than  a  function  from  schemes  to  instances. 


A  -  3 


A.2.  The  Relational  Algebra 

From  a  set  of  instances,  we  may  form  new  instances  with  the  operators  of 
the  relational  algebra. 

Let  tep(R).  (That  is,  let  t  be  a  tuple  of  the  instance  for  R  in  p.)  We  use 
square  brackets  to  denote  function  evaluation;*  thus  the  A  -value  of  t  is 
t  [4  Jedom  ( A  )  for  each  A  eR.  For  a  set  X,  t  [X ]  is  the  restriction  of  the  function  t 
to  the  attributes  of  X.  We  assume  XCR  for  t  [Jf]  to  be  well  defined.  We  may  now 
define  the  operator,  projection. 

Let  X  be  a  (possibly  empty)  set  of  attributes,  r  an  instance.  The  projection 
of  r  onto  X  is 

*x{r)^\t[X]  |  ter] 

For  Jf  =  0,  this  becomes  n*(r )=(<>$,  the  set  containing  the  empty  tuple,  for 
any  r. 

If  s=nx(r),  then  s  is  an  instance  of  the  scheme  X.  If  K=*.¥j.  .  .  .  .  is  a  collec¬ 
tion  of  sets  of  attributes,  we  write 

nR(r)=\T:Xl{T) . **n(r)j 

Thus  7T R(r)  is  a  state  of  the  schema  R. 

Where  the  projection  operator  takes  ‘vertical’  slices  of  an  instance,  the 
selection  operator  constructs  a  horizontal  slice. 

Let  X'CR,  r  an  instance  of  R  Let.  x  be  an  Y-tuple  (also  called  an  X— value;  in 
short,  a  tuple  defined  precisely  on  the  attibutes  X).  Then 

°X=x(r)~\t  t£r  /\t\X}=x\ 

If  X-[Xx,  .  .  .  ,Xk]  and  i  =  '.2 j  •  •  •  xk>,  v:c  may  also  writs 
<*Xx=zxA  •  •  ■  A X'=zk('r  )- 


tor  subscription 


Uicse  denote  veo- 


A-  4 


The  next  operator  takes  two  relation  instances  and  forms  a  wider  result 
from  them. 

Let  r  be  an  instance  of  R\  s  of  5.  Then  the  (natural)  join  of  r  and  s  is 

r  *  ;  t  [/?  ]€r  A  t  [S’  ) 

Note  that  RS  is  the  scheme  of  r  *  s.  The  following  is  clear  and  the  proof  is 
left  to  the  reader. 

Fact  A. 1  For  instances  r,  s,  t 

i)  r  *  s  =  s  •  r  (commutativity) 

ii)  (r  •  s )  •  t  =  r  •  (s  *  t)  (associativity)  ■ 

From  this  fact,  we  are  justified  in  writing,  for  any  set  of  instances  r=[ri,  ....  rnj, 

n 

the  expression  r\  *  •  *  rn,  or  *  i\,  to  stand  for  the  result  of  any  sequence  of 

t= l 

joins  on  these  instances. 

These  are  the  only  operators  we  will  need  for  the  investigations  of  this 
thesis.  For  a  fuller  account,  see  Ullman  [U]  or  Tsichritzis  and  Lochovsky  [TLl]. 


A.3.  Interpretation. 

Up  till  now  wre  have  been  proceeding  purely  syntactically.  We  must  reco- 
ginze  that  a  database  is  a  repository  of  information  about  something.  We  need 
not  concern  ourselves  overly  with  the  thing,  frequently  called  the  real  world 
if  there  were  such  a  beast)  described  by  the  database.  We  assume  the  user’s 
interest  is  the  thing  described:  our  interest  is  in  the  description. 


Consider  the  ^entnnrp  "Marc  and  "Rrenda  are  husband  and  wife.*'  me  frag¬ 
ment  "are  husband  and  wife"  is  called  the  pf‘edirryte  of  this  sentence.  Now  con¬ 
sider  the  object  Marri.age  (< Marc,  Brenda>).  The  name  Marriage  is  a  predicate 
said  to  hold  for  < Marc,  Brenda  >.  Wc  take  the  view  that  the  user’s  description  of 


the  real  world  is  a  set  of  predicates:  furthermore,  each  relation  scheme  of  the 


A-  5 


schema  represents  the  "intention"  of  one  such  predicate.  The  tuples  of  an 
instance  represent  values  (entities,  things,  points)  about  which  the  predicate  is 
true,  or  at  least,  thought  to  be  true. 

Let  R=  [/?),..  .  ,  Rk  \  be  a  schema  and  iet  Pi  be  the  predicate  for  Ri.  Thus 
tep(Ri)  implies  that  Pi{t)  is  (believed  to  be)  true*.  For  any  ex  pression  over 
selection,  projection  and  join  there  is  a  simple  means  of  expressing  the  predi¬ 
cate  of  the  result  in  the  predicates  of  the  operands.  Thus, 

1 )  If  P  is  the  predicate  of  R\  that  is,  for  r  an  instance  of  R 

t  er=>P(0 

Then 


u€~x(r)^>(3v)P(uv) 


where  v  is  an  R  -X  tuple. 


2)  For  instances  formed  by  selection 

v.  €  <rx=j.  (r )  =^-p  ( u  )  -A  u  [-*’  1 = 1 

3)  If  S’  is  the  predicate  of  an  instance  s  then 


u€r  *  s  (u )  A  S  (u) 


Note  that  we  do  not  write 


t€r<=>P(t) 

which  might  well  require  all  true  facts  to  be  recorded  in  a  state  of  the  database. 
A.4.  Dependencies 

As  the  states  of  databases  are  the  descriptions  of  some  independent 
phenomenon,  they  exhibit  a  certain  regularity  or  predictability  which  reflects 
that  phenomenon,  it  may  be  advantageous  fur  the  user  Lu  allow  the  database 
management  system  to  exploit  these  regularities.  To  do  so,  there  must  be  a 


t— 


' V> e  are  not  very  concerned  with  the  difference  between  belief  and  reality.  Still,  it  is  best  not 
to  believe  things  are  cleaner  than  they  are 


A  -  6 


means  of  describing  these  to  the  system.  Such  a  means  is  the  class  of  state¬ 
ments  called  data  dependencies.  We  ■will  describe  the  ‘classic’  dependencies 
here. 

There  are  often  cases  in  which  a  predicate  describes  a  functional  relation¬ 
ship.  Thus  an  employee  has  a  single  salary;  a  husband  has  one  wife.  For  these 
cases  we  have  the  functional  dependencies  or  fd's. 

We  say  that  X  functionally  determines  Y,  written  X r-*Y,  in  an  instance  r  if  for 
all  tuples  t,  u  €r,  if  t  [X]=u[X  ]  then  t  [T]=u[T  ]. 

This  definition  is  of  a  property  which  may  be  predicated  of  an  instance  of  a  rela¬ 
tion.  Suppose  there  are  two  salaries  recorded  for  an  employee,  although  the  fd 
Emp-*Sal  has  been  asserted.  The  system  con  conclude  from  this  that  something 
is  incorrectly  specified.  Dependencies  such  as  fd’s  arc  called  constraints,  as 
they  disallow  certain  states. 

Not  all  interesting  relationships  are  functional.  The  salary  history  of  an 
employee  is  a  set  of  salaries  and  dates.  The  relationship  of  an  employee  to  his 
salary  history  is  a  multi-valued  dependency  or  mvd. 

We  say  that  X  multi-determines  written  X-*-*Y  in  an  instance  r  of  a 
scheme  R-XYZ ,  if  the  tuples  < *.y\.Z\>  and  <z,y2'22>  appearing  in  r 
implies  that  the  tuple  <x,y1,z2>  appears  in  r. 

The  most  interesting  and  important  fact  about  mvd’s  is  the  following. 

Fact  A.  2  The  mvd  X  -*-*  Y  holds  m  an  instance  r  of  XYZ  if  and  only  if 


nr  —  tt  -  —  -  /  \  *  - .  /  y  \  • 

'  “  ••At  V  /  "JL£  J 

Mvd' s  are  revealed  by  this  fact  to  be  binary  join  dependencies. 


Let  .  .  .  ,  Xx  v  be  a  set  of  subsets  of  a  scheme  R  such  that 


»  - 1 


Then  an  instance  r  of  R  satisfies  the  join  dependency  *  [A'  -t.  .  .  .  ,  X^  ],  if 


A-  7 


r=nXl(r)  *  •  •  •  *  ^(r) 

Let  Z3  be  a  set  of  dependencies.  We  say  that  D  logically  implies  d,  and  write 
D^d  if  every  relation  instance  which  satisfies  every  dependency  in  D  also 
satisfies  d.  Let  Z)+=|d  |  D\*d\.  The  implication  problem  is  the  recognition  prob¬ 
lem  for  the  set  \<D,d>  i  Z?(=d{.  For  the  dependencies  in  this  section,  the  implica¬ 
tion  problem  is  decidable.  It  is  decidable  in  polynomial  time  whenever  D  con¬ 
tains  mvd’s  and  fd’s  only  and  d  is  an  mvd  or  fd  [Be]. 

A  set  of  dependencies  C  is  said  to  cover  a  set  D  (or  to  be  a  cover  of  D),  if 

C  is  said  to  be  a  non-re  dund  ant  cover  of  D  ii  no  proper  subset  of  C  is  « 

cover  of  D. 


A  dependecy  d  is  embedded  in  a  relation  scheme  R  is  the  attributes  in  d  all 

appear  in  R.  Let  F  be  a  set  of  fd's  only.  Let  be  tbe  eubeet  of  F*  embedded 

in  R.  We  say  a  schema  R  is  cover  embedding  (wrt  F)  if 


Re.  R 


=F  + 


A.5.  Some  elementary  notions  of  logic 

A  first  order  language  urith  equality,  L,  is  a  triple  L=<P,F,C>  together  with 
a  set  of  logical  symbols  V,  A,  — ,  =  etc.  P  is  the  set  of  predicate  letters,  F  the  set 
of  function  letters,  C  the  set  of  constants  of  L.  We  will  assume  the  reader  to  be 

familiar  with  tha  syntax  of  first  order  languages.  We  will  briefly  and  informally 
introduce  the  semantics  of  such  languages. 

A  structure  for  the  language  L  is  a  mapping  or  interpretation  of  the  symbols 
of  L.  Thus  a  structure  A  provides 

•  a  non-empty  set  A 

•  for  each  n,  for  each  n-ary  predicate  symbol  in  P,  an  n-ary  relation,  a 
subset  of  A  n; 


A-  8 


•  for  each  n -place  function  symbol  in  F.  a  function  f'An-*A 

•  for  each  element  of  C,  an  element  of  A. 

A  structure  is  an  interpretation  of  a  language.  It  gives  a  possible  meaning  to  the 
formulae  of  the  language.  Truth  is  defined  with  respect  to  structures:  The  sen¬ 
tence  ff=(Vi)(y>(i))  is  said  to  be  true  in  the  structure  A  if  for  every  element 
a  90(a)  is  true.  A  stucture  is  called  a  model  of  any  sentence  or  set  of  sen¬ 
tences  which  are  true  in  it.  An  isomorphic  image  of  the  structure  A  is  the  result 
of  renaming  the  elements  of  .4  in  an  injective  or  1-to-l  way  everywhere  they 
appear.  Two  isomorphic  structures  are  models  of  exactly  the  same  sentences. 

It  is  no  great  leap  of  the  imagination  to  think  of  a  relational  database  state 
as  a  structure  for  some  language. 


A. 6.  Dependencies,  Again 

The  dependencies  can  be  written  as  first  order  statements.  Let  P  be  a 
3-ary  predicate. 

(fd:X-Y) 


(Vary  2z , z  2)  ((P (1,  v  l  2  [)  A P  (1,  v a,  z 2) )=>v  1= V 2) 


(mvd  X  -*  Y ) 


(¥ xy  ;y 1  ? 2)  ((P(x. y  •„  e  1 )  A P(x.y z,  ? z) )=>P (x. y  z z) ) 


(jd :  *[XY.  YZ,  XZ]) 


(y-xxx2y  i!/2- \Zz)\{P{x\V  iS2)'A‘P(z2y  1Z1) /\P(xly2z  \))^P(x].y  i*  1)) 

We  leave  it  to  the  reader  to  convince  himself  that  a  3-aiy  relation  instance 

satisfies  the  given  fd,  mvd  or  jd  preciesly  when  it  is  a  model  of  the  corresponding 
sentence. 


1 


We  may 
angiirige  with 


generalize  the  notion  of  dependency  still  further  Consider  any 
a  single  predicate  letter  P  (and  equality).  Consider  any  sentence 


of  the  form 


A-  9 


(V*: 


_  \  t—y 

ZkJKZVi 


.  ^  /  4  * 

yuv*  i  A 


AA 


m 


-*>B  \/  \ 


ABm) 


C) 


where 


•  k,  77i,n^  1;  l  tO 

•  TVl  O  <d  *  r*  oro  4-  /■>  i  m1  n  r»  *-s  D 

i  ilL  n  o  laI  a  uuiiiio  iwi  inuiuo  l/li  i 

•  The  Z?’s  are  atomic  formulae  either  on  P  or  on  equality  (these  are  called 
equality  assertions ) 


•  Only  and  all  of  the  x’s  (i.e.,  universally  quantified  variables)  appear  among 
the  variables  of  the  j4's 

•  the  sentence  contains  no  constants 

•  the  sentence  is  typed;  that  is,  a  variable  appears  in  position  i  of  one 
atomic  formula  on  P  and  position  j  of  another  only  if  i-j\  X{-Xj  appears  as 
a  formula  only  if  xt,  x;  appear  in  the  same  position  of  some  pair  of.4’s  (note: 
the  reader  may  convince  himself  that  equality  assertions  involving  existen¬ 
tially  bound  variables  are  useless). 

Sentences  of  this  form  are  called  implicational  dependencies  [F3J.  We  can  sim¬ 
plify  them  slightly. 

If  a  is  of  the  form  of  (*),  reorder  for  convenience,  the  right  hand  side  such 
that  for  some  0 

•  each  of  B  j  through  B ,•  is  an  atomic  formula  on  P 


•  each  of  the  B]  +  x  through  Bn  is  an  equality  assertion 


Then  a  is  equivalent  to  the  conjunction  of  the  sentences 


(Vxx  •  •  •  zk)(3yx  ■  ■  ■  yt)(A  j/\  •  •  ■  AA-^B  ;A  ■  ■  ■  A  B.)  ( l) 

(Vxj  xk)\A\/''  '  /\Am~-3p)  (j<P~71) 

Sentences  of  the  type  of  (1)  are  called  tuple  generating  dependencies  or  tgd's. 

Join  dependencies  are  a  special  type  of  tgd.  If  £>0,  that  is,  if  there  are  existen¬ 
tially  bound  variables,  presumptively  appearing  within  the  B' s,  then  a  tgd  is 


A  -  10 


called  ‘partial  or  embedded  (this  latter  term  has,  regrettably,  two  distinct  mean¬ 
ings). 

The  remaining  sentences,  whose  right  hand  sides  are  equality  assertions, 
are  called  equality  generating  dependencies  or  egd' s.  Functional  dependencies 
are  a  type  of  egd. 


References 


[A]  Armstrong,W.W.;"Dependency  Structures  of  Data  Base  Relationships,” 
Proc.  of  IFIP  1974,  North-Holland.  580-583 

[ABU]  Aho,A.V.;Beeri,C;Ullman,J;"The  Theory  of  Joins  in  Relational  Databases," 
TODS  4,3  (1979)  297-314 

[AC]  Arora,A.K.;CarIson,C.R.;"The  Information  Preserving  Properties  of  Rela¬ 
tional  Database  Transformations,"  Proc.  of  VLDB  1978  352-359 

[ASU]  Aho,A.V.;Sagiv,Y.;Ullman,J. /'Equivalence  Among  Relational  Expressions," 
SIAM  J.  Computing  8,2  (1 979)  218-246 

[B]  Bernstein, P;"Synthesizing  Third  Normal  Form  Relations  from  Functional 
Dependencies,"  TODS  1,4  (1976)  277-298 

[Bach]  Bachman, C.W.;"Data  Structure  Diagrams,"  Data  Base  ( 1 ,2),  pp.  4-10 

[BB]  Beeri,C;Bernstein,P;"Computational  Problems  Related  to  the  Design  of 
Normal  Form  Relational  Schemas,"  TODS  4,1  (1 979)  30-59 

[BBG]  Beeri,C;Bernstein,P;Goodman,N.;"A  Sophisticate’s  introduction  to  data¬ 
base  normalization  theory,"  Proc.  of  VLDB  78  pp.  113-124 

[BDB]  Biskup,J;Dayal,U;Bernstein,P;  "Synthesizing  Independent  Database  Sche¬ 
mas,"  SICMOD  79  143-151 

[BK]  Beeri,C.,Koneyman,P.;"Preserving  Functional  Dependencies,"  to  appear, 
SIAM  J.  Comput. 

[BMSU]Beeri,C;Mendelzon, A;Sagiv,Y;Ullman,J;"Equivalence  of  Relational  Database 
Schemes,"  TR-252  ( 1978)  Princeton  University 

[BR]  Beeri,C.;Rissanen,J.;  "Faithful  Representations  of  Relational  Database 
Schemes,"  IBM  RJ 27 22  1980 

[BVl]  Beeri,C.:Vardi,M.Y.;  "Decision  Problems  for  Data  Dependencies,"  Dept,  of 
Computer  Sciences, Hebrew  University  of  Jerusalem,  May  1980 

[BV2]  Beeri,C.;Vardi,M.Y.  "A  proof  procedure  for  data  dependencies,"  Dept,  of 
Computer  Sciences,  Hebrew  University  of  Jerusalem,  Dec.  1980 

[CKM]  Chandra, A.K.;Lewis,H.R.;Makowsky,J.A.;"Embedded  Implicational  Depen¬ 
dencies  and  their  decision  problem,"  Proc.  XP1  1980 

[CM]  Chandra.A.K., Merlin, P.M.;"0ptimal  Implementation  of  Conjunctive  Queries 

in  Relational  Data  Bases,"  Proc.  Ninth  Annual  ACM  Symp.  on  Theory  of 
Comp.,  (May  1976)  pp.  77-90 

[ C0D7§]C0DASYL  Data  Description  Language  Journal  of  Development,  Material 
Data  Management  Branch,  Dcp.  of  Supply  and  Services,  Ottawa 

[Cl]  Codd,E.F.;"A  relational  model  of  data  for  large  shared  data  banks,”  CACM 
13  pp.  377-387 

[C2]  Codd,E.F.;"Further  Normalization  of  the  Data  Base  Relational  Model,"  in 
[Rustin],  pp.  33-64 

[C3]  Codd,E.F.;"Relational  completeness  of  data  base  sublanguages,"  in  [Rus¬ 
tin],  pp.  65-99 

[DST]  Downey, P. J.;Sethi,R.;Trajan,R.E.;"Variations  on  the  Common  Subexpres¬ 
sion  Problem,"  JACM  27,4  (Oct.  19870),  pp.  758-772 

[Enderton]Enderton,H.B.;  A  Mathematical  Introduction  to  Logic,  Academic 
Press,  1972 


R-  2 


[FMU]  Fagin,R.;Mendelzon,A.O.;Ullman,J.D.;  "A  simplified  Universal  Relation 
Assumption  and  its  properties,"  IBM RJ2900  Nov.  1980 

[Fl]  Fagin.R;  'The  decomposition  versus  synthetic  approach  to  relational 
database  design,"  Proc  of  VLDB  77  441-446 

[F2]  Fagin,R.;"Multivalued  dependencies  and  a  new  normal  form  for  relational 
databases,"  TODS  (1977)  vol.  2,  pp.  262-278 

[F3]  Fagin.R.;  "Normal  Forms  and  Relational  Database  Operators,"  Proc.  SIC- 
MOD  79  pp.  153-160 

[F4]  Fagin,R;”Horn  Clauses  and  Data  Dependencies,"  IBM  RJ2741  (1980) 

[F5]  Fagin.R;  ’’A  Normal  Form  for  Relational  Databases  that  is  Based  on 
Domains  and  Keys,"  IBM  RJ2520  (1930) 

[H]  Koneyman,P;"Testing  Satisfaction  of  Funtional  Dependencies,"  to  appear, 
JACM 

[Ha]  Harary.F \Graph  Theory,  Addison-Wesley,  (1969) 

[HLY]  Honeyman,P;Ladner,R.;Yannakakis,M;"Testing  the  Universal  Instance 
Assumption,"  Information  Processing  Letters  10:1  (1980)  14-19 

[K]  Karp, R.M. /'Deducibility  among  combinatorial  problems,"  in  Miller  and 

Thatcher,  (eds.),  Complexity  of  Computer  Compulations,  Plenum  Press, 
1972,  pp.  85-103 

[KIKo]  Klimbie,J.W.;Koffman,K.L.;  Data  Base  Management,  North  Holland,  1974 

[Lang]  Langefors,  B.  "Some  approaches  to  the  theory  of  information  systems," 
BIT  3  (1963),  pp.  229-254 

[LT]  Ling,T;Tompa,F;  "Implicit  Constraints  Within  Relational  Data  Bases,”  DCS, 
University  of  Waterloo.  CS-78-35  1978 

[Mend]  Mendelzon.A.O.;  "Database  States  and  their  tableaux,"  Proc.  XP2  1981 

[MMS]  Maier,D.;Mendelzon,A.;Sagiv,Y.;"Testing  Implications  of  Data  Dependen¬ 
cies,”  TODS  4,4  (1979)  455-469 

[MU]  Maier,D.;Ullman,J.D.;  "Maximal  Objects  and  the  semantics  of  universal 
relation  databases,"  TR-80-016,  Dept,  of  C.S.,  SUNY  Stony  Brook,  1980 

[Parker]Atenzi,P.;Parker,D.S.;  "Properties  of  Acyclic  Database  Schemes:  An 
Analysis,"  Proc.  XP2  1981 

[R]  Rissanen,J.;"Theory  of  relations  for  databases  -  A  tutorial  Survey,"  Proc. 
7th  Symp.  MF0CS  (1978),  pp.  537-581 

[Rl]  Rissanen.J. /’Independent  Components  of  Relations,"  TODS  2,4  (Dec.  77), 

pp.  317-325 

[Rustin]Rustin,R.  (ed);  Data  Base  Systems,  Prentice-Hall,  1972 

[S]  Sethi, R.  "Testing  for  the  Church  Rosser  Property,”  JACM  21 ,4  (1974)  pp. 
671-679 

[Sag]  Sagiv,Y.;"Can  we  use  the  universal  instance  assumption  without  using 
nulls?"  Proc.  SICMOD  (1931),  pp. 108-120 

[Sc]  Sciore.E./’Real  World  Mvd’s,"  Proc.  SICMOD  81  pp.  121-132 

[Sc2]  Sciore.E.;  "Improving  Semantic  Specification  in  a  Relational  Database," 
Proc.  SICMOD  79  pp.  170-173 

[Suppcs]Suppes,P.;  Axiomatic  Set  Theory  VanNostrand,  1960 


R-  3 


[TLl]  Tsichritzis,D.;Lochovsky,F,;  Data  Base  Management  Systems,  Academic 
Press,  1977 

[TL]  Tsichritzis,D.;Lochovsky,F.;  Data  Models,  Prentice  Hall,  1982 

[U]  Ullman,J;Princ't/D£es  of  Database  Systems,  Computer  Science  Press, 
(1980) 

[V]  Vassiliou,Y.;"A  Formal  Treatment  of  Imperfect  Information  in  Database 
Management,"  Ph.D.  Thesis,  University  of  Toronto,  (1980) 


University  of  Toronto 

Computer  Systems  Research  Group 


BIBLIOGRAPHY  OF  CSRG  TECHNICAL  REPORTS  1980  -  present 

*  -  Out  of  print 

*  CSRG- 100  DIALOGUE  ORGANISATION  AND  STRUCTURE  FOR 

INTERACTIVE  INFORMATION  SYSTEMS 
John  Leonard  Barron 
[M.Sc.  Thesi*.  DCS,  1980] 

*  CSRG- 109  A  UNIFYING  MODEL  OF  PHYSICAL  DATABASES 

D.S.  Batory,  C.C.  Gotlieb,  April  1990 

*  CSRG-110  OPTIMAL  FILE  DESIGNS  AND  REORGANIZATION  POINTS 

D.S.  Batory,  April  1980 

*  CSRG-111  A  PANACHE  OF  DBMS  IDEAS  III 

D.  Tsichritzis  (ed.),  April  1980 

CSRG-112  TOPICS  IN  PSN  -  II:  EXCEPTIONAL  CONDITION 

HANDLING  IN  PSN;  REPRESENTING  PROGRAMS  IN  PSN; 
CONTENTS  IN  PSN 

Yves  Lesperance,  Byron  M.  Kramer,  Peter  F.  Schneider 
April,  1980 

CSRG-113  SYSTEM-ORIENTED  MACRO-SCHEDULING 
C.C.  Gotlieb  and  A.  Schonbach 
May  1980 

CSRG-1 14  A  FRAMEWORK  FOR  VISUAL  MOTION  UNDERSTANDING 
John  Konstantine  Tsotsos 
[Ph.D.  Thesis,  DCS,  June  1980] 

CSRG-1 15  SPECIFICATION  OF  CONCURRENT  EUCLID 
James  R.  Cordy  and  Richard  C.  Holt 
July  1980 

CSRG-1 16  THE  REPRESENTATION  OF  PROGRAMS  IN  THE 

PROCEDURAL  SEMANTIC  NETWORK  FORMALISM 

Bryan  M.  Kramer 

[M.Sc.  Thesis,  DCS.  1980] 

CSRG-1 17  CONTEXT-FREE  GRAMMARS  AND  DERIVATION  TREES  AS 
PROGRAMMING  TOOLS 
Volker  Linnemann 
September  1980 

CSRG-1 18  S/SL:  SYNTAX/SEMANTIC  LANGUAGE 
INTRODUCTION  AND  SPECIFICATION 
R.C.  Holt,  J.R.  Cordy,  D.B.  Wortman 
CSRG,  September  1980 


-  2  - 


CSRG-119  PT:  A  PASCAL  SUBSET 
Alan  Rosselet 

[M.Sc.  Thesis,  DCS,  October  1980] 

CSRG-120  PTED:  A  STANDARD  PASCAL  TEXT  EDITOR  BASED  ON 
THE  KERNIGHAN  AND  PLAUGER  DESIGN 
Ken  Newman,  DCS 
October  1930 

CSRG-121  TERMINAL  CONTEXT  GRAMMARS 
Howard  W.  Trickey 
[M.Sc.  Thesis,  EE,  September  1980] 

CSRG-122  THE  APPROXIMATE  SOLUTION  OF  LARGE  QUEUEING 
NETWORK  MODELS 
John  Zahorjan 

[Ph.D.  Thesis,  DCS,  August  1980] 

CSRG-123  A  FORMAL  TREATMENT  OF  IMPERFECT  INFORMATION 
IN  DATABASE  MANAGEMENT 
Yannis  Vassiliou 

[Ph.D.  Thesis,  DCS,  September  1980] 

CSRG-124  AN  ANALYTIC  MODEL  OF  PHYSICAL  DATABASES 
Don  S.  Batory 

[Ph.D.  Thest*.  DCS,  January  1801] 

CSRG-125  MACHINE-INDEPENDENT  CODE  GENERATION 
Richard  H.  Kozlak 
[M.Sc.  Thesis,  DCS,  January  1981] 

CSRG-126  COMPUTER  MACRO-SCHEDULING  FOR  HIGH  PRODUCTIVITY 
Abraham  Schonbach 
[Ph.D.  Thesis,  DCS,  March  1981] 

CSRG-12?  OMEGA  ALPHA 

D.  Tsichritzis  (ed.),  March  1981 

CSRG-128  DIALOGUE  AND  PROCESS  DESIGN  FOR  INTERACTIVE 
INFORMATION  SYSTEMS  USING  TAXIS 
John  Barron,  April  1981 

CSRG-129  DESIGN  AND  VERIFICATION  OF  INTERACTIVE  INFORMATION 
SYSTEMS  USING  TAXIS 
Harry  K.T.  Wong 

[Ph.D.  Thesis,  DCS,  to  be  submitted] 

CSRG-130  DYNAMIC  PROTECTION  OF  OBJECTS  IN  A  COMPUTER  UTILITY 
Leslie  H.  Goldsmith,  April,  1981 

CSRG-131  INTEGRITY  ANALYSIS:  A  METHODOLOGY  FOR  EDP  AUDIT 
AND  DATA  QUALITY  CONTROL 
Maija  Irene  Svanks 
[Ph.D.  Thesis,  DCS,  February  1981] 


-  3  - 


CSRG-132  A  PROTOTYPE  KNOWLEDGE-BASED  SYSTEM 

FOR  COMPUTER-ASSISTED  MEDICAL  DIAGNOSIS 

Stephen  A.  Ho-Tai 

[M. Sc. Thesis,  DCS,  January  1981] 

CSRG-133  SPECIFICATION  OF  CONCURRENT  EUCLID 
James  R.  Cordy,  Richard  C.  Holt 
August  1981  (Version  l) 

CSRG-134  ANOTHER  LOOK  AT  COMMUNICATING  PROCESSES 
E.C.R.  Hehner  and  C.A.R.  Hoare,  July,  1981 

CSRG-135  ROBUST  CONCURRENCY  CONTROL  IN  DISTRIBUTED  DATABASES 
Derek  L.  Eager 

[M.Sc.  Thesis,  DCS,  October  1981] 

CSRG-136  ESTIMATING  SELECTIVITIES  IN  DATA  BASES 
Stavros  Christodoulakis 
[Ph.D.  Thesis,  DCS,  December  1981] 

CSRG-137  SATISFYING  DATABASE  STATES 
Marc  H.  Graham 

[Ph.D.  Thesis,  DCS,  December  1981] 


